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Abstract 

Static  analysis  designers  must  carefully  balance  precision  and  ef¬ 
ficiency.  In  our  experience,  many  static  analysis  tools  are  built 
around  an  elegant,  core  algorithm,  but  that  algorithm  is  then  exten¬ 
sively  tweaked  to  add  just  enough  precision  for  the  coding  idioms 
seen  in  practice,  without  sacrificing  too  much  efficiency.  There  are 
several  downsides  to  adding  precision  in  this  way:  the  tool’s  imple¬ 
mentation  becomes  much  more  complicated;  it  can  be  hard  for  an 
end-user  to  interpret  the  tool’s  results;  and  as  software  systems  vary 
tremendously  in  their  coding  styles,  it  may  require  significant  algo¬ 
rithmic  engineering  to  enhance  a  tool  to  perform  well  in  a  particular 
software  domain. 

In  this  paper,  we  present  Mix,  a  novel  system  that  mixes  type 
checking  and  symbolic  execution.  The  key  aspect  of  our  approach 
is  that  these  analyses  are  applied  independently  on  disjoint  parts  of 
the  program,  in  an  off-the-shelf  manner.  At  the  boundaries  between 
nested  type  checked  and  symbolically  executed  code  regions,  we 
use  special  mix  rules  to  communicate  information  between  the  off- 
the-shelf  systems.  The  resulting  mixture  is  a  provably  sound  analy¬ 
sis  that  is  more  precise  than  type  checking  alone  and  more  efficient 
than  exclusive  symbolic  execution.  In  addition,  we  also  describe  a 
prototype  implementation,  MiXY,  for  C.  MiXY  checks  for  potential 
null  dereferences  by  mixing  a  null/non-null  type  qualifier  inference 
system  with  a  symbolic  executor. 

Categories  and  Subject  Descriptors  D.2.4  [Software  Engineer¬ 
ing]:  Software/Program  Verification;  D.2.5  [Software  Engineer¬ 
ing]:  Testing  and  Debugging — Symbolic  execution;  F.3.2  [Log¬ 
ics  and  Meanings  of  Programs]:  Semantics  of  Programming  Lang¬ 
uages — Program  analysis 

General  Terms  Languages,  Verification 

Keywords  Mix,  mixed  off-the-shelf  analysis,  symbolic  execution, 
type  checking,  mix  rules,  false  alarms,  precision 

1.  Introduction 

All  static  analysis  designers  necessarily  make  compromises  be¬ 
tween  precision  and  efficiency.  On  the  one  hand,  static  analysis 
must  be  precise  enough  to  prove  properties  of  realistic  software 
systems,  and  on  the  other  hand,  it  must  run  in  a  reasonable  amount 
of  time  and  space.  One  manifestation  of  this  trade-off  is  that,  in  our 
experience,  many  practical  static  analysis  tools  begin  with  a  rel- 
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atively  straightforward  algorithm  at  their  core,  but  then  gradually 
accrete  a  multitude  of  special  cases  to  add  just  enough  precision 
without  sacrificing  efficiency. 

Some  degree  of  fine  tuning  is  inevitable — undecidability  of 
static  analysis  means  that  analyses  must  be  targeted  to  programs 
of  interest — but  an  ad-hoc  approach  has  a  number  of  disadvan¬ 
tages:  it  significantly  complicates  the  implementation  of  a  static 
analysis  algorithm;  it  is  hard  to  be  sure  that  all  the  special  cases 
are  handled  correctly;  and  it  makes  the  tool  less  predictable  and 
understandable  for  an  end-user  since  the  exact  analysis  algorithm 
becomes  obscured  by  the  special  cases.  Perhaps  most  significantly, 
software  systems  are  extremely  diverse,  and  programming  styles 
vary  greatly  depending  on  the  application  domain  and  the  idiosyn¬ 
crasies  of  the  programmer  and  her  community’s  coding  standards. 
Thus  an  analysis  that  is  carefully  tuned  to  work  in  one  domain  may 
not  be  effective  in  another  domain. 

In  this  paper,  we  present  Mix,  a  novel  system  that  trades  off  pre¬ 
cision  and  efficiency  by  mixing  type  checking — a  coarse  but  highly 
scalable  analysis — with  symbolic  execution  [King  1976],  which  is 
very  precise  but  inefficient.  In  Mix,  precision  versus  efficiency  is 
adjusted  using  typed  blocks  {t  e  t}  and  symbolic  blocks  {s  e  s} 
that  indicate  whether  expression  e  should  be  analyzed  with  type 
checking  or  symbolic  execution,  respectively.  Blocks  may  nest  ar¬ 
bitrarily  to  achieve  the  desired  level  of  precision  versus  efficiency. 

The  distinguishing  feature  of  Mix  is  that  its  type  checking  and 
symbolic  execution  engines  are  completely  standard,  off-the-shelf 
implementations.  Within  a  typed  or  symbolic  block,  the  analyses 
run  as  usual.  It  is  only  at  the  boundary  between  blocks  that  we  use 
special  mix  rules  to  translate  information  back-and-forth  between 
the  two  analyses.  In  this  way.  Mix  gains  precision  at  limited  cost, 
while  potentially  avoiding  many  of  the  pitfalls  of  more  complicated 
approaches. 

As  a  hypothetical  example,  consider  the  following  code: 

1  {s 

2  if  (multithreaded) {t  fork();  t} 

^  {t  .  . .  t} 

4  if  (multithreaded) {t  lock();  t} 

5  {t  ■  ■  •  t} 

6  if  (multithreaded) {t  unlock();  t} 

7  s} 

This  code  uses  multiple  threads  only  if  multithreaded  is  set  to 
true.  Suppose  we  have  a  type-based  analysis  that  checks  for  data 
races.  Then  assuming  the  analysis  is  path  insensitive,  it  cannot  tell 
whether  a  thread  is  created  on  line  2,  and  it  does  not  know  the  lock 
state  after  lines  4  and  6 — all  of  which  will  lead  to  false  positives. 

Rather  than  add  path  sensitivity  to  our  core  data  race  analysis, 
we  can  instead  use  Mix  to  gain  precision.  We  wrap  the  program 
in  a  symbolic  block  at  the  top  level  so  that  the  executions  for  each 
setting  of  multithreaded  will  be  explored  independently.  Then  for 
performance,  we  wrap  all  the  other  code  (lines  3  and  5  and  the  calls 
to  fork,  lock,  and  unlock)  in  typed  blocks,  so  that  they  are  analyzed 
with  the  type-based  analysis.  In  this  case,  these  block  annotations 
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effectively  cause  the  type-based  analysis  to  be  run  twice,  once  for 
each  possible  setting  of  multithreaded;  and  by  separating  those  two 
cases,  we  avoid  conflation  and  eliminate  false  positives. 

While  Mix  cannot  address  every  precision/efficiency  tradeoff 
issue  (for  example,  the  lexical  scoping  of  typed  and  symbolic 
blocks  is  one  limitation),  there  are  nonetheless  many  potential  ap¬ 
plications.  Among  other  uses.  Mix  can  encode  forms  of  flow  sen¬ 
sitivity,  context  sensitivity,  path  sensitivity,  and  local  type  refine¬ 
ments.  Mix  can  also  use  type  checking  to  overcome  some  limi¬ 
tations  of  symbolic  execution  (Section  2).  Also,  for  the  purposes 
of  this  paper,  we  leave  the  placement  of  block  annotations  to  the 
programmer,  but  we  envision  that  an  automated  refinement  algo¬ 
rithm  could  heuristically  insert  blocks  as  needed.  In  this  scenario. 
Mix  becomes  an  intermediate  language  for  modularly  combining 
off-the-shelf  analyzer  implementations. 

In  this  paper,  we  formalize  Mix  for  a  small  imperative  lan¬ 
guage,  mixing  a  standard  type  checking  system  with  symbolic  ex¬ 
ecution  to  yield  a  system  to  check  for  the  absence  of  run-time  type 
errors.  Thus,  rather  than  checking  for  assertion  failures,  as  a  typical 
symbolic  executor  might  do,  our  formal  symbolic  executor  reports 
any  type  mismatches  it  detects.  To  mix  these  two  systems  together, 
we  introduce  two  new  rules:  one  rule  in  the  type  system  that  “type 
checks”  blocks  {s  e  s}  using  the  symbolic  executor;  and  one  rule 
in  the  symbolic  executor  that  “executes”  blocks  {t  e  t}  using  the 
type  checker.  We  prove  that  the  type  system,  symbolic  executor, 
and  mix  of  the  two  systems  are  sound.  The  soundness  proof  of  Mix 
uses  the  proofs  of  type  soundness  and  symbolic  execution  sound¬ 
ness  essentially  as-is,  which  provides  some  additional  evidence  of  a 
clean  modularization.  Additionally,  two  features  of  our  formalism 
for  symbolic  execution  may  be  of  independent  interest:  we  discuss 
the  tradeoff  between  “forking”  the  symbolic  executor  and  giving 
more  work  to  the  solver;  and  we  provide  a  soundness  proof,  which, 
surprisingly,  we  have  been  unable  to  find  for  previous  symbolic  ex¬ 
ecution  systems  (Section  3). 

Finally,  we  describe  MiXY,  a  prototype  implementation  of  Mix 
for  C.  Mixy  combines  a  simple,  monomorphic  type  qualifier  in¬ 
ference  system  (a  reimplementation  of  Foster  et  al.  [2006])  with  a 
C  symbolic  executor.  There  are  two  key  challenges  that  arise  when 
mixing  type  inference  rather  than  checking:  we  need  to  perform 
a  fixed-point  computation  as  we  switch  between  typed  and  sym¬ 
bolic  blocks  since  data  values  can  pass  from  one  to  the  other  and 
back;  and  we  need  to  integrate  aliasing  information  into  our  analy¬ 
sis  so  that  pointer  manipulations  performed  within  symbolic  blocks 
correctly  influence  typed  blocks.  Additionally,  we  extend  MiXY  to 
support  caching  block  results  as  well  as  recursion  between  blocks. 
We  use  Mixy  to  look  for  null  pointer  errors  in  a  reasonably-sized 
benchmark  vsf  tpd;  we  found  several  examples  where  adding  sym¬ 
bolic  blocks  can  eliminate  false  positives  compared  to  pure  type 
qualifier  inference  (Section  4). 

We  believe  that  Mix  provides  a  promising  new  approach  to 
trading  off  precision  and  efficiency  in  static  analysis.  We  expect 
that  the  ideas  behind  Mix  can  be  applied  to  many  different  combi¬ 
nations  of  many  different  analyses. 

2.  Motivating  Examples 

Before  describing  Mix  formally,  we  examine  some  coding  idioms 
for  which  type  inference  and  symbolic  execution  can  profitably  be 
mixed.  Our  examples  will  be  written  in  either  an  ML-like  language 
or  C-like  language,  depending  on  which  one  is  more  natural  for  the 
particular  example. 

Path,  Flow,  and  Context  Sensitivity.  In  the  introduction,  we 
saw  one  example  in  which  symbolic  execution  introduced  a  small 
amount  of  path  sensitivity  to  type  inference.  There  are  several  po¬ 
tential  variations  on  this  example  where  we  can  locally  add  a  little 


bit  of  path  sensitivity  to  increase  the  precision  of  type  checking. 
For  example,  we  can  avoid  analyzing  unreachable  code: 

{t  ...  {s  if  true  then  {t  5  t}  else  {t  ”  foo"  -I-  3  t}  s}  . . .  t} 

This  code  runs  without  errors,  but  pure  type  checking  would  com¬ 
plain  about  the  potential  type  error  in  the  false  branch.  However, 
with  these  block  annotations  added  in  Mix,  the  symbolic  executor 
will  only  invoke  the  type  checker  for  the  true  branch  and  hence  will 
avoid  a  false  positive. 

We  can  also  use  symbolic  execution  to  gain  some  flow  sensi¬ 
tivity.  For  example,  in  a  dynamically-typed  imperative  language, 
programmers  may  reuse  variables  as  different  types,  such  as  in  the 
following: 

{t  . . .  {s  var  X  =  l;{t  . . .  t}  ;  X  =  "foo”  ;  s}  . .  ■  t} 

Here  the  local  variable  x  is  first  assigned  an  integer  and  is  later 
reused  to  refer  to  a  string.  With  the  annotations  above,  we  can 
successfully  statically  check  such  code  using  the  symbolic  executor 
to  distinguish  the  two  different  assignments  to  x,  then  type  check 
the  code  in  between. 

Similar  cases  can  occur  if  we  try  to  apply  a  non-standard  type 
system  to  existing  code.  For  example,  in  our  case  study  (Sec¬ 
tion  4.5),  we  applied  a  nullness  checker  based  on  type  qualifiers 
to  C.  We  found  some  examples  like  the  following  code: 

{t  . . .  {s  x^obj  =  NULL; 

X— obj  =  ( . . .  )malloc( . . . );  s}  . .  .J 

Here  x-^obj  is  initially  assigned  to  NULL,  immediately  before  be¬ 
ing  assigned  a  freshly  allocated  location.  A  flow  insensitive  type 
qualifier  system  would  think  that  x-^bj  could  be  NULL  after  this 
pair  of  assignments,  even  though  it  cannot  be. 

Finally,  we  can  also  use  symbolic  execution  to  gain  context 
sensitivity,  though  at  the  cost  of  duplicate  work.  For  example,  in 
the  following: 

{s  let  id  X  =  x  in{t  . . .  {s  id  3  s}  ■  ■  •  {s  id  3.0  s}  •  ■  ■  t}  s} 

the  identity  function  id  is  called  with  an  int  and  a  float.  Rather  than 
adding  parametric  polymorphism  to  type  check  this  example,  we 
could  wrap  those  calls  in  symbolic  blocks,  which  in  Mix  causes 
the  calls  to  be  checked  with  symbolic  execution.  While  this  is 
likely  not  useful  for  standard  type  checking,  for  which  parametric 
polymorphism  is  well-understood,  it  could  be  very  useful  for  a 
more  advanced  type  system  for  which  fully  general  parametric 
polymorphic  type  inference  might  be  difficult  to  implement  or 
perhaps  even  undecidable. 

A  combination  of  context  sensitivity  and  path  sensitivity  is 
possible  with  Mix.  For  example,  consider  the  following: 

{s 

let  div  X  y  =  if  y  =  0  then  “err”  else  x  /  y  in 

{t  ■  •  •  -l-{s  div  7  4  s}  t} 

s} 

where  the  div  function  may  return  an  int  or  a  string,  but  it  returns  a 
string  (indicating  error)  only  when  the  second  argument  is  0.  Note 
that  this  level  of  precision  would  be  out  of  the  reach  of  parametric 
polymorphism  by  itself. 

Local  Refinements  of  Data.  Symbolic  execution  can  also  poten¬ 
tially  be  used  to  model  data  more  precisely  for  non-standard  type 
systems.  As  one  example,  suppose  we  introduce  a  type  qualifier 
system  that  distinguishes  the  sign  of  an  integer  as  either  positive, 
negative,  zero,  or  unknown.  Then  we  can  use  symbolic  execution 
to  refine  the  type  of  an  integer  after  a  test: 

{t 

let  X  :  unknown  int  =  . .  .in 

{s 
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if  X  >  0  then{t  (*  x  :  pos  int  *)  . .  .t} 
else  if  X  =  0  then{t  (*  x  :  zero  int  *)  . .  .t} 
else{t  (*  X  :  neg  int  *)  . .  -  t} 

s} 

t} 

Here  on  entry  to  the  symbolic  block,  x  is  an  unknown  integer,  so 
the  symbolic  executor  will  assign  it  an  initial  symbolic  value  Ox 
ranging  over  all  possible  integers.  Then  at  the  conditional  branches, 
the  symbolic  executor  will  fork  and  explore  the  three  possibilities: 
Ox  >  0,  Ox  =  0,  and  Ox  <  0.  On  entering  the  typed  block  in 
each  branch,  since  the  value  of  x  is  constrained  in  the  symbolic 
execution,  the  type  system  will  start  with  the  appropriate  type  for  x, 
either  pos,  zero,  or  neg  int,  respectively. 

As  another  example,  suppose  we  have  a  type  system  to  prevent 
data  races  in  C.  Then  a  common  problem  that  arises  is  analyzing 
local  initialization  of  shared  data  [Pratikakis  et  al.  2006].  Consider 
the  following  code: 

{t 

{s 

x  =  (struct  foo  *)  malloc(sizeof( struct  foo)); 
x— »bar  =  . . . ; 
x— »baz  =  . . . ; 

X— >qux  =  . . . ; 

s} 

insert(shared_data_structure,  x); 
t} 

Here  we  allocate  a  new  block  of  memory  and  then  initialize  it  in 
several  steps  before  it  becomes  shared.  A  flow-insensitive  type- 
based  analysis  would  report  an  error  because  the  writes  through 
X  occur  without  a  lock  held.  On  the  other  hand,  if  we  wrap  the 
allocation  and  initialization  in  a  symbolic  block,  as  above,  symbolic 
execution  can  easily  observe  that  x  is  local  during  the  initialization 
phase,  and  hence  the  writes  need  not  be  protected  by  a  lock. 

Helping  Symbolic  Execution.  The  previous  examples  considered 
adding  precision  in  type  checking  through  symbolic  execution. 
Alternatively,  typed  blocks  can  potentially  be  used  to  introduce 
conservative  abstraction  in  symbolic  execution  when  the  latter  is 
not  viable.  For  example: 

{s 

let  X  ={t  unknown_function()  t}in  . . . 

let  y  ={t  2**z  operation  not  supported  by  solver  t}'U  ... 

{t  while  true  do{s  loop.body  s}t} 
s} 

The  first  line  contains  a  call  to  a  function  whose  source  code  is  not 
available,  so  we  cannot  symbolically  execute  the  call.  However,  if 
we  know  the  called  function’s  type,  then  we  can  wrap  the  call  in 
a  typed  block  (assuming  the  function  has  no  side  effects),  conser¬ 
vatively  modeling  its  return  value  as  any  possible  member  of  its 
return  type.  Similarly,  on  the  second  line,  we  are  performing  an 
exponentiation  operation,  and  let  us  suppose  the  symbolic  execu¬ 
tor’s  solver  cannot  model  this  operation  if  z  is  symbolic.  Then  by 
wrapping  the  operation  in  a  typed  block,  we  can  continue  symbolic 
execution,  again  conservatively  assuming  the  result  of  the  exponen¬ 
tiation  is  any  member  of  the  result  type.  The  third  line  shows  how 
we  could  potentially  handle  long-running  loops  by  wrapping  them 
in  typed  blocks,  so  that  symbolic  execution  would  effectively  skip 
over  them  rather  than  unroll  them  (infinitely).  We  can  also  recover 
some  precision  within  the  loop  body  by  further  wrapping  the  loop 
body  with  a  symbolic  block. 

3.  The  Mix  System 

In  the  previous  section,  we  considered  a  number  of  idioms  that  mo¬ 
tivate  the  design  of  Mix.  Here,  we  consider  a  core  language,  shown 
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Figure  I.  Program  expressions,  types,  and  symbolic  expressions. 

in  the  top  portion  of  Figure  1,  with  which  we  study  the  essence  of 
switching  blocks  for  mixing  analyses.  Our  language  includes  vari¬ 
ables  x;  integers  n;  booleans  true  and  false;  selected  arithmetic  and 
boolean  operations  -F,  =,  -i,  and  A;  conditionals  with  if;  let  bind¬ 
ings;  and  ML-style  updatable  references  with  ref  (construction),  ! 
(dereference),  and  :=  (assignment).  We  also  include  two  new  block 
forms,  typed  blocks  {t  e  t}  and  symbolic  blocks  {s  e  s},  which 
indicate  e  should  be  analyzed  with  type  checking  or  symbolic  ex¬ 
ecution,  respectively.  We  leave  unspecified  whether  the  outermost 
scope  of  a  program  is  treated  as  a  typed  block  or  a  symbolic  block; 
Mix  can  handle  either  case. 

3.1  Type  Checking  and  Symbolic  Execution 

Type  checking  for  our  source  language  is  entirely  standard,  and 
so  we  omit  those  rules  here.  Our  type  checking  system  proves 
judgments  of  the  form  F  h  e  :  r,  where  F  is  the  type  environment 
and  r  is  e’s  type.  Grammars  for  F  and  r  are  given  in  the  bottom 
portion  of  Figure  1 . 

The  remainder  of  this  section  describes  a  generic  symbolic  ex¬ 
ecutor.  While  the  concept  of  symbolic  execution  is  widely  known, 
there  does  not  appear  to  be  a  clear  consensus  of  its  definition.  Thus, 
we  make  explicit  our  definition  of  symbolic  execution  here  through 
a  formalization  similar  to  an  operational  semantics.  Such  a  formal¬ 
ization  enables  us  to  describe  the  switching  between  type  checking 
and  symbolic  execution  in  a  uniform  manner. 

Symbolic  Expressions,  Memories,  and  Environments.  The  re¬ 
mainder  of  Figure  1  describes  the  symbolic  expressions  and  en¬ 
vironments  used  by  our  symbolic  executor.  Symbolic  expressions 
are  used  to  accumulate  constraints  in  deferral  rules.  For  example, 
the  symbolic  expression  (Q;:int  +  3:int):int  represents  a  value  that 
is  three  more  than  the  unknown  integer  a. 

Because  we  are  concerned  with  checking  for  run-time  type  er¬ 
rors,  in  our  system  symbolic  expressions  s  have  the  form  w.t, 
where  m  is  a  bare  symbolic  expression  and  r  is  its  type.  With  these 
type  annotations,  we  can  immediately  determine  the  type  of  a  sym¬ 
bolic  expression,  just  like  in  a  concrete  evaluator  with  values.  As  a 
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shorthand,  we  use  g  to  represent  conditional  guards,  which  are  just 
symholic  expressions  with  type  bool.  Bare  symbolic  expressions  u 
may  be  symbolic  variables  a  (e.g.,  a:int  is  a  symbolic  integer,  and 
Q;:bool  is  a  symbolic  boolean);  known  values  v;  or  operations  +, 
=,  -1,  A  applied  to  symbolic  expressions  of  the  appropriate  type. 
Notice  that  our  syntax  forbids  the  formation  of  certain  ill-typed 
symbolic  expression  (e.g.,  ai:int  -|-  a2:bool  is  not  allowed). 

Symbolic  expressions  also  include  symbolic  memory  accesses 
m[u:T  ref],  which  represents  an  access  through  pointer  u  in  sym¬ 
bolic  memory  m.  A  symbolic  memory  may  be  /i,  representing  an 
arbitrary  but  well-typed  memory;  m,  (s  ^  s'),  a  memory  that 
is  the  same  as  m  except  location  s  is  updated  to  contain  s';  or 
m,  (s  s'),  which  is  the  same  as  m  except  newly  allocated  lo¬ 
cation  s  points  to  s'.  These  are  essentially  McCarthy-style  sel  and 
upd  expressions  that  allow  the  symbolic  executor  to  accumulate 
a  log  of  writes  and  allocations  while  deferring  alias  analysis.  An 
allocation  always  creates  a  new  location  that  is  distinct  from  the  lo¬ 
cations  in  the  base  unknown  memory,  so  we  distinguish  them  from 
arbitrary  writes. 

Finally,  symbolic  environments  E  map  local  variables  x  to 
(typed)  symbolic  expressions  s. 

Symbolic  Execution  for  Pure  Expressions.  Figure  2  describes 
our  symbolic  executor  on  pure  expressions  using  what  are  essen¬ 
tially  big-step  operational  semantics  rules.  The  rules  in  Figure  2 
prove  judgments  of  the  form 

E  h  (5  ;  e)  ^  (5' ;  s) 

meaning  with  local  variables  bound  in  E,  starting  in  state  S,  expres¬ 
sion  e  evaluates  to  symbolic  expression  s  and  updates  the  state  to 
S' .  In  our  symbolic  execution  judgment,  a  state  S'  is  a  tuple  {g-,m), 
where  (?  is  a  path  condition  constraining  the  current  state  and  m  is 
the  current  symbolic  memory.  The  path  condition  begins  as  true, 
and  whenever  the  symbolic  executor  makes  a  choice  at  a  condi¬ 
tional,  we  extend  the  path  condition  to  remember  that  choice  (more 
on  this  below).  We  write  X{S)  for  the  X  component  of  S,  with 
X  e  {g,  m},  and  similarly  we  write  SfX  Y]  for  the  state  that 
is  the  same  as  S,  except  its  X  component  is  now  Y . 

Most  of  the  rules  in  Figure  2  are  straightforward  and  intend  to 
summarize  typical  symbolic  executors.  Rule  SEVar  evaluates  a 
local  variable  by  looking  it  up  in  the  current  environment.  Notice 
that,  as  with  standard  operational  semantics,  there  is  no  reduction 
possible  if  the  variable  is  not  in  the  current  environment.  Rule 
SEVal  reduces  values  to  themselves,  using  the  auxiliary  function 
typeofju)  that  examines  the  value  form  to  return  its  type  (i.e., 
typeof(n)  =  int  and  typeof(true)  =  typeof (false)  =  bool). 

Rules  SEPlus,  SEEq,  SENot,  and  SEAnd  execute  the 
subexpressions  and  then  form  a  new  symbolic  expression  with  -I-, 
=,  or  A,  respectively.  Notice  that  these  rules  place  requirements 
on  the  subexpressions — for  example,  SEPlus  requires  that  the 
subexpressions  reduce  to  symbolic  integers,  and  SENOT  requires 
that  the  subexpression  reduces  to  a  guard  (a  symbolic  boolean).  If 
the  subexpression  does  not  reduce  to  an  expression  of  the  right  type, 
then  symbolic  execution  fails.  Thus,  these  rules  form  a  symbolic 
execution  engine  that  does  very  precise  dynamic  type  checking. 

Rule  SELet  symbolically  executes  ei  and  then  binds  ei  to  x  for 
execution  of  62.  The  last  two  rules,  SEIf-True  and  SEIf-False, 
model  a  pure,  non-deterministic  version  of  the  kind  of  symbolic  ex¬ 
ecution  popularized  by  DART  [Godefroid  et  al.  2005],  CUTE  [Sen 
et  al.  2005],  EXE  [Cadar  et  al.  2006],  and  KLEE  [Cadar  et  al. 
2008].  When  we  reach  a  conditional,  we  conceptually  fork  exe¬ 
cution,  extending  the  path  condition  with  gi  or  -igi  to  indicate  the 
branch  taken.  EXE  and  KLEE  would  both  invoke  an  SMT  solver  at 
this  point  to  decide  whether  one  or  both  branches  are  feasible,  and 
then  try  all  feasible  paths.  DART  and  CUTE,  in  contrast,  would 
continue  down  one  path  as  guided  by  an  underlying  concrete  run 
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Figure  2.  Symbolic  execution  for  pure  expressions. 


(so-called  “concolic  execution”),  but  then  would  ask  an  SMT  solver 
later  whether  the  path  not  taken  was  feasible  and,  if  so,  come  back 
and  take  it  eventually.  All  of  these  implementation  choices  can  be 
viewed  as  optimizations  to  prune  infeasible  paths  or  hints  to  focus 
the  exploration.  Since  we  are  not  concerned  with  performance  in 
our  formalism,  we  simply  extend  the  path  condition  and  continue — 
eventually,  when  symbolic  execution  completes,  we  will  check  the 
path  condition  and  discard  the  path  if  it  is  infeasible.  To  get  sound 
symbolic  execution,  we  will  compute  a  set  of  symbolic  executions 
and  require  that  all  feasible  paths  are  explored  (see  Section  3.2). 

Sometimes,  the  symbolic  executor  may  want  to  throw  away 
information  (e.g.,  replace  a  symbolic  expression  for  a  compli¬ 
cated  memory  read  with  a  fresh  symbolic  variable).  Such  a  rule 
is  straightforward  to  add,  but  as  discussed  in  Section  3.2,  a  nested 
typed  block  {t  e  t}  serves  a  similar  purpose. 

Deferral  Versus  Execution.  Consider  again  the  rules  for  sym¬ 
bolic  execution  on  pure  expressions  in  Figure  2.  Excluding  the  triv¬ 
ial  SEVal  rule,  the  first  set  of  rules  (SEPlus,  SEEq,  SENot, 
and  SEAnd)  versus  the  second  set  (SELet,  SEVar,  SEIf-True, 
SEIf-False)  seem  qualitatively  different.  The  first  set  simply  get 
symbolic  expressions  for  their  subexpressions  and  form  a  new  sym- 
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bolic  expression  of  the  corresponding  operator,  essentially  defer¬ 
ring  any  reasoning  about  the  operation  (e.g.,  to  an  SMT  solver). 
In  contrast,  the  second  set  does  not  accumulate  any  such  symbolic 
expression  but  rather  chooses  a  possible  concrete  execution  to  fol¬ 
low.  For  example,  we  can  view  SEIf-True  as  choosing  to  assume 
that  gi  is  concretely  true  and  proceeding  to  symbolically  execute 
62.  This  assumption  is  recorded  in  the  path  condition.  (The  SELet 
and  SEVar  rules  are  degenerate  execution  rules  where  no  assump¬ 
tions  need  to  he  made  because  there  is  only  one  possible  concrete 
execution  for  each.)  Alternatively,  we  see  that  there  are  symbolic 
expression  forms  for  -f,  =,  -i,  and  A  but  not  for  let,  program  vari¬ 
ables,  and  if. 

Although  it  is  not  commonly  presented  as  such,  the  decision 
of  deferral  versus  execution  is  a  design  choice.  For  example,  let 
us  include  an  if-then-else  symbolic  expression  glsi:s2  (using  a  C- 
style  conditional  syntax)  that  evaluates  to  si  if  <?  evaluates  to  trae 
and  S2  otherwise.  Then,  we  could  defer  to  the  evaluation  of  the 
conditional  to  the  solver  with  the  following  rule: 


Symbolic  Execution  for  References. 
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S'  =  ((gi?g(S2):g(S3))  ;  (ffi?m(S2):m(S3))) 

E  F  (S  ;  (if  ei  then  ea  else  63))  J)  (S'  ;  {gil (u2-.r)-.{u3.T))-.T) 

Here  notice  we  also  have  to  extend  the  •?  •  :•  relation  to  operate 
over  memory  as  well.  With  this  rule,  we  need  not  “fork”  symbolic 
execution  at  all.  However,  note  that  even  with  conditional  symbolic 
expressions  and  condition  symbolic  memory,  this  rule  is  more  con¬ 
servative  than  the  SEIf-True  and  SEIf-False  execution  rules,  as 
it  requires  both  branches  to  have  the  same  type. 

Conversely,  other  rules  may  also  be  made  non-deterministic  in 
manner  similar  to  SEIf-*.  For  example,  SEVar  may  instead  return 
an  arbitrary  value  v  and  add  E(a;)  =  m  to  the  path  condition,  a  style 
that  resembles  hybrid  concolic  testing  [Majumdar  and  Sen  2007]. 
A  special  case  of  execution  rules  are  ones  that  apply  only  when 
we  have  concrete  values  during  symbolic  execution  and  thus  do  not 
need  to  “fork.”  For  example,  we  could  have  a  SEPlus-Conc  that 
applies  to  two  concrete  values  m,  na  and  returns  the  sum.  This 
approach  is  reminiscent  of  partial  evaluation. 

These  choices  trade  off  the  amount  of  work  done  between  the 
symbolic  executor  and  the  underlying  SMT  solver.  For  example, 
SEIf-Defer  introduces  many  disjunctions  into  symbolic  expres¬ 
sions,  which  then  may  be  hard  to  solve  efficiently.  To  match  current 
practice,  we  stick  with  the  forking  variant  for  conditionals,  but  we 
believe  our  system  would  also  be  sound  with  SEIf-Defer. 

Symbolic  References.  Figure  3  continues  our  symbolic  executor 
definition  with  rules  for  updatable  references.  We  use  deferral  rules 
for  all  aspects  of  references  in  our  formalization.  Rule  SERef  eval¬ 
uates  ei  and  extends  m{Si)  with  an  allocation  for  fresh  symbolic 
pointer  a.  Similarly,  rule  SEAssiGN  extends  Sa  to  record  that  si 
now  points  to  sa-  Observe  that  allocations  and  writes  are  simply 
logged  during  symbolic  execution  for  later  inspection.  Also,  no¬ 
tice  that  we  allow  any  value  to  be  written  to  si,  even  if  it  does  not 
match  the  type  annotation  on  si .  In  contrast,  standard  type  systems 
require  that  any  writes  to  memory  must  preserve  types  since  the 
type  system  does  not  track  enough  information  about  pointers  to  be 
sound  if  that  property  is  violated.  Symbolic  execution  tracks  every 
possible  program  execution  precisely,  and  so  it  can  allow  arbitrary 
memory  writes. 

In  SEDeref,  we  evaluate  ei  to  a  pointer  ui:r  ref  and  then 
produce  the  symbolic  expression  m{Si)[ui:T  ref]:r  to  represent 
the  contents  of  that  location.  However,  here  we  are  faced  with  a 
challenge:  we  are  not  actually  looking  up  the  contents  of  memory; 
rather,  we  are  simply  forming  a  symbolic  expression  to  represent 
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Figure  3.  Symbolic  execution  for  updatable  references. 

the  contents.  How,  then,  do  we  determine  the  type  of  the  pointed- 
to  value?  We  need  the  type  so  that  we  can  halt  symbolic  execution 
later  if  that  value  is  used  in  a  type-incorrect  manner.  That  is,  we  do 
not  want  to  defer  the  discovery  of  a  potential  type  error. 

Our  solution  is  to  use  the  type  annotation  on  the  pointer  to 
get  the  type  of  the  contents — but  above  we  just  explained  that 
SEAssign  allows  writes  to  violate  those  type  annotations.  There 
are  many  potential  ways  to  solve  this  problem.  We  could  invoke 
an  SMT  solver  to  compute  the  actual  set  of  addresses  that  could 
be  dereferenced  and  fork  execution  for  each  one.  Or  we  could 
proceed  as  our  implementation  and  use  an  external  alias  analysis 
to  conservatively  model  all  possible  locations  that  could  be  read  to 
check  that  the  values  at  all  locations  have  the  same  type  (Section  4). 
However,  to  keep  the  formal  system  simple,  we  choose  a  very 
coarse  solution:  we  require  that  all  pointers  in  memory  are  well- 
typed  with  the  check  F  m{Si)  ok. 

This  judgment  is  defined  in  the  bottom  portion  of  Figure  3  in 
terms  of  the  auxiliary  judgment  F  m  ok  (7,  which  means  mem¬ 
ory  m  is  consistently  typed  (pointers  point  to  values  of  the  right 
type),  except  for  mappings  in  U.  There  are  four  cases  for  this  judg¬ 
ment.  Empty-OK  says  that  arbitrary  well-typed  memory  p,  is  con¬ 
sistently  typed.  Similarly,  Alloc-OK  says  that  if  m  is  consistently 
typed  except  for  potentially  inconsistent  writes  in  U,  then  adding  an 
allocation  preserves  consistent  typing  up  to  U.  Rule  OVERWRITE- 
OK  says  that  if  F  m  ok  (7  and  we  extend  m  with  a  well-typed 
write  to  mi,  then  any  previous,  inconsistent  writes  to  locations 
Si  =  Ml  :r  ref  can  be  ignored.  Here  by  =  we  mean  syntactic  equiv¬ 
alence,  but  in  practice  we  could  query  a  solver  to  validate  such 
an  equality  given  the  current  path  condition.  Rule  Arbitrary- 
NotOK  says  that  any  write  can  be  added  to  U  and  viewed  as  po¬ 
tentially  inconsistent.  Finally,  M-OK  says  that  F  m  ok  if  m  has  no 
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Block  Typing. 
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Block  Symbolic  Execution. 


E  h  (S  ;  e)  ^  (S'  ;  s) 


the  current  symbolic  memory  state  be  consistent,  since  the  typed 
block  relies  purely  on  type  information  (rather  than  tracking  pointer 
values  as  symbolic  execution  does).  Then  we  type  check  e  in  T, 
yielding  a  type  r.  The  typed  block  itself  symbolically  evaluates  to 
a  fresh  symbolic  variable  a  of  type  r.  Since  the  typed  block  may 
have  written  to  memory,  we  conservatively  set  the  memory  of  the 
output  state  to  a  fresh  p' ,  indicating  we  know  nothing  about  the 
memory  state  at  that  point  except  that  it  is  consistent. 

Note  that  in  our  formalism,  we  do  not  have  typed  blocks  within 
typed  blocks,  or  symbolic  blocks  within  symbolic  blocks,  though 
these  would  be  trivial  to  add  (by  passing-through). 


SETypBlock 

|-E:r  l-m(S')ok  ri-e:T  p',  a  ^1^,3 
E  h  (S' ;  {t  e  t})  (S[m  p]  ;  a:r) 


Symbolic  and  Typing  Environment  Conformance. 


h  E  :  r 


dom{T,)  =  dom{T) 

E(a;)  =  M:r(a;)  (for  all  x  €  dom(r)) 
h  E  :  r 


Figure  4.  Mixing  symbolic  execution  and  type  checking. 


inconsistent  writes  that  persist.  Together,  these  rules  ensure  that  the 
type  assigned  to  the  result  of  a  dereference  is  sound.  We  can  also 
see  how  the  SEDeref  may  be  made  more  precise  by  only  requir¬ 
ing  consistency  up  to  a  set  of  writes  U  and  querying  a  solver  to 
show  that  til  '.T  ref  are  disequal  to  all  the  address  expressions  in  U. 

3.2  Mixing 

In  the  previous  section,  we  considered  type  checking  and  symbolic 
execution  separately,  ignoring  the  blocks  that  indicate  a  switch  in 
analysis.  Figure  4  shows  the  two  mix  rules  that  capture  switching 
between  analyses. 

Rule  TSymBlock  describes  how  to  type  check  a  symbolic 
block  {s  e  s},  that  is,  how  to  apply  symbolic  execution  to  de¬ 
rive  a  type  of  a  subexpression  for  a  type  checker.  First,  we  con¬ 
struct  an  environment  E  that  maps  each  variable  a:  in  T  to  a  fresh 
symbolic  variable  ax,  whose  type  is  extracted  from  T.  Then  we 
run  the  symbolic  execution  under  E,  starting  in  a  state  with  true 
for  the  path  condition  and  a  fresh  symbolic  variable  p  to  stand 
for  the  current  memory.  Recall  that,  because  of  SEIf-True  and 
SEIf-False,  symbolic  execution  is  actually  non-deterministic — 
it  conceptually  can  branch  at  conditionals.  If  we  want  to  soundly 
model  the  entire  possible  behavior  of  e,  we  need  to  execute  all 
paths.  Thus,  we  run  the  symbolic  executor  n  times,  yielding  final 
states  {Si  ;  uct)  for  i  £  l..n,  and  we  require  that  the  disjunction 
of  the  guards  from  all  executions  form  a  tautology.  This  constraint 
ensures  that  we  exhaustively  explore  every  possible  path  (see  Sec¬ 
tion  3.3  about  soundness).  And  if  all  those  paths  executed  success¬ 
fully  without  type  errors  and  returned  a  value  of  the  same  type  r, 
then  that  is  the  type  of  expression  e.  We  also  check  that  all  paths 
leave  memory  in  a  consistent  state. 

Symbolic  execution  has  typically  been  used  as  an  unsound  anal¬ 
ysis  where  there  is  no  exhaustiveness  check  like  exhaustive{. . .) 
in  the  TSymBlock.  We  can  also  model  such  unsound  analysis  by 
weakening  exhaustive{. .  .)  to  a  “good  enough  check.” 

The  other  rule,  SETypBlock,  describes  how  to  symbolically 
execute  a  typed  block  {t  e  t},  that  is,  how  to  apply  the  type  checker 
in  the  middle  of  a  symbolic  execution.  We  begin  by  deriving  a  type 
environment  T  that  maps  local  variables  to  the  types  of  the  symbols 
they  are  mapped  to  in  E.  This  mapping  is  described  precisely  by  the 
judgment  h  E  :  T,  which  is  straightforward.  We  also  require  that 


Why  Mix?  The  mix  rules  are  essentially  as  precise  as  possible 
given  the  strengths  and  limitations  of  each  analysis.  The  nested 
analysis  starts  with  the  maximum  amount  of  information  that  can 
be  extracted  from  the  other  static  analysis — for  symbolic  blocks, 
the  only  available  information  for  symbolic  execution  is  types, 
whereas  for  typed  blocks,  the  type  checker  only  cares  about  types  of 
variables  and  thus  abstracts  away  the  symbolic  expressions.  After 
the  nested  analysis  is  complete,  the  result  is  similarly  passed  back 
to  the  enclosing  analysis  as  precisely  as  possible. 

For  this  paper,  we  deliberately  chose  two  analyses  at  opposite 
ends  of  the  precision  spectrum:  type  checking  is  cheap,  flow  insen¬ 
sitive  with  a  rather  coarse  abstraction,  while  symbolic  execution  is 
expensive,  flow  and  path  sensitive  (and  context  sensitive  if  we  add 
functions)  with  a  minimal  amount  of  abstraction  (i.e.,  it  is  not  even 
a  proper  program  analysis  per  se,  as  there  are  no  termination  guar¬ 
antees).  They  also  work  in  such  a  different  manner  that  it  does  not 
seem  particularly  natural  to  combine  them  in  tighter  ways  (e.g.,  as  a 
reduced  product  of  abstract  interpreters  [Cousot  and  Cousot  1979]). 
We  think  it  is  surprising  just  how  much  additional  precision  we  can 
obtain  and  the  kinds  of  idioms  we  can  analyze  from  such  a  simple 
mixing  of  an  entirely  standard  type  system  and  a  typical  symbolic 
executor  as-is  (as  we  see  in  Section  2).  We  note  that  a  type  system 
capturing  all  of  the  examples  in  Section  2  would  likely  be  quite 
advanced  (involving,  for  example,  dependent  types). 

However,  as  can  be  seen  in  Figure  4,  the  conversion  between 
these  two  analyses  may  be  extremely  lossy.  For  example,  in 
SETypBlock,  the  memory  after  returning  from  the  type  checker 
must  be  a  fresh  arbitrary  memory  p'  because  e  may  make  any  num¬ 
ber  of  writes  not  captured  by  the  type  system  and  thus  not  seen  by 
the  symbolic  executor.  We  can  also  imagine  mixing  any  number  of 
analyses  in  arbitrary  combination,  yielding  different  precision/effi¬ 
ciency  tradeoffs.  For  example,  if  we  were  to  use  a  type  and  effect 
system  rather  than  just  a  type  system,  we  could  avoid  introducing  a 
completely  fresh  memory  p'  in  SETypBlock — instead,  we  could 
find  the  effect  of  e  and  limit  applying  this  “havoc”  operation  only 
to  locations  that  could  have  been  changed. 

3.3  Soundness 

In  this  section,  we  sketch  the  soundness  of  Mix,  which  is  described 
in  full  detail  in  Appendix  A.  The  key  feature  of  our  proof  is 
that  aside  from  the  mix  rule  cases,  it  reuses  the  standalone  type 
soundness  and  symbolic  execution  soundness  proofs  essentially  as- 
is. 

We  show  soundness  with  respect  to  a  standard  big-step  opera¬ 
tional  semantics  for  our  simple  language  of  expressions.  Our  se¬ 
mantics  is  given  by  a  judgment  E  h  (M;  e)  ^  r.  This  says  that 
in  a  concrete  environment  E,  an  initial  concrete  memory  M  and  an 
expression  e  evaluate  to  a  result  r.  A  concrete  environment  maps 
variables  to  values,  while  a  concrete  memory  maps  locations  to  val¬ 
ues.  The  evaluation  result  r  is  either  a  concrete  memory-value  pair 
(M';  v)  or  a  distinguished  error  token. 

To  prove  mix  soundness,  we  consider  simultaneously  type  and 
symbolic  execution  soundness.  While  type  soundness  is  standard. 
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we  discuss  it  briefly,  as  it  is  a  part  of  mix  soundness,  and  provides 
intuition  for  symbolic  execution  soundness. 

For  type  soundness,  we  introduce  a  memory  type  environment 
A  that  maps  locations  to  types,  and  we  update  the  typing  judgment 
to  carry  this  additional  environment,  as  F  Fa  e  :  r  where  A  is 
constant  in  all  rules.  In  many  proofs,  A  is  included  in  F  rather 
than  being  called  out  separately,  but  for  mix  soundness  separat¬ 
ing  locations  from  variables  makes  the  proof  easier.  To  show  type 
soundness,  we  need  a  relation  between  the  concrete  environment 
and  memory  {E-,  M)  and  the  type  environment  and  memory  typing 
(F;  A).  We  write  this  relation  as  {E\  M)  ~  (F;  A),  which  infor¬ 
mally  says  two  things:  (1)  the  type  environment  F  abstracts  the 
concrete  environment  E,  that  is,  the  concrete  value  v  mapped  by 
each  variable  x  in  E  has  type  F(®),  and  (2)  the  memory  typing  A 
abstracts  the  concrete  memory  M,  that  is,  the  concrete  value  v  at 
each  location  I  in  M  has  type  A{1).  We  also  talk  about  the  second 
component  in  isolation,  in  which  case  we  write  M  ~  A  to  mean 
memory  typing  A  abstracts  the  concrete  memory  M. 

Type  soundness  is  the  first  part  of  mix  soundness  (statement  1  in 
Theorem  1,  shown  below).  Let  us  consider  the  pieces.  Suppose  we 
have  a  concrete  evaluation  E  F  (M ;  e)  ^  r.  We  further  suppose 
that  e  has  type  r  in  typing  environments  that  are  sound  with  respect 
to  the  concrete  state  (i.e.,  {E-,  M)  ~  (F;  A)).  Then,  the  result  r 
must  be  a  memory- value  pair  (M';  v)  where  the  resulting  concrete 
memory  is  abstracted  by  A',  an  extension  of  A,  and  the  resulting 
value  V  has  the  same  type  r  in  F  with  the  extended  memory  typing 
A'.  Notice  this  captures  the  notions  that  well-typed  expressions 
cannot  evaluate  to  error  and  that  evaluation  preserves  typing. 

For  symbolic  execution  soundness,  we  need  to  ensure  that  a 
symbolic  execution  faithfully  models  actual  concrete  executions. 
Let  be  a  valuation,  which  is  a  finite  mapping  from  symbolic 
values  a  to  concrete  values  v  or  concrete  memories  M.  We  write 
|[s|'^,  and  for  the  natural  extension  of  V  to  operate 

on  arbitrary  symbolic  expressions,  memories,  and  the  symbolic  en¬ 
vironment.  Symbolic  execution  begins  with  symbolic  values  a  for 
unknown  inputs  and  accumulates  a  symbolic  expression  s  that  rep¬ 
resents  the  result  of  the  program.  Then  at  a  high-level,  if  symbolic 
execution  is  sound,  then  a  concrete  run  that  begins  with  Ja]  ^  for 
inputs  should  produce  the  expression  Js]  ^ . 

To  formalize  this  notion,  we  need  a  soundness  relation  between 
the  concrete  evaluation  state  and  the  symbolic  execution  state,  just 
as  in  type  soundness.  The  form  of  our  soundness  relations  for 
symbolic  execution  states  is  as  follows: 

{E;M)  ^Ao-v-a  (E;m) 

This  relation  captures  two  key  properties.  First,  applying  the  valu¬ 
ation  V  to  the  symbolic  state  should  yield  the  concrete  state  (i.e., 
=  E  and  \mf'  =  M).  Second,  the  types  of  symbolic  ex¬ 
pressions  in  E  and  m  must  be  correctly  related.  Recall  that  an  addi¬ 
tional  property  of  our  typed  symbolic  execution  is  that  it  tracks  the 
type  of  symbolic  expressions  and  halts  upon  encountering  ill-typed 
expressions.  The  typing  of  symbolic  reference  expressions  must  be 
with  respect  to  some  memory  typing.  This  memory  typing  is  given 
by  Ao  and  A.  For  technical  reasons,  we  need  to  separate  the  loca¬ 
tions  in  the  arbitrary  memory  on  entry  Aq  from  the  locations  that 
come  from  allocations  during  symbolic  execution  A;  to  get  typing 
for  the  entire  memory,  we  write  Ao  *  A  to  mean  the  union  of  sub¬ 
memory  typings  Ao  and  A  with  disjoint  domains.  Analogously,  we 
also  have  a  symbolic  soundness  relation  that  applies  to  memory- 
value  pairs:  {M-,v)  ~Ao-v  A  (m;s). 

As  alluded  to  above,  we  first  consider  a  notion  of  symbolic  ex¬ 
ecution  soundness  with  respect  to  a  concrete  execution.  This  no¬ 
tion  is  what  is  stated  in  the  second  part  of  mix  soundness  (Theo¬ 
rem  1).  Analogous  to  type  soundness,  it  says  that  suppose  we  have 
a  concrete  evaluation  E  \-  (M;e)  r  and  a  symbolic  execution 


E  F  (S';e)  {S';s}  such  that  the  symbolic  state  is  an  abstraction  of 

the  concrete  state  (i.e.,  {E-,  M)  ~Ao  V  A  (E;  m{S)}).  There  is  one 
more  premise,  |[p(S'')]]'^,  which  says  that  the  path  condition  accu¬ 
mulated  during  symbolic  execution  holds  under  this  valuation.  This 
constrains  the  concrete  and  symbolic  executions  to  follow  the  same 
path.  With  these  premises,  symbolic  execution  soundness  says  that 
the  result  of  symbolic  execution,  that  is  the  memory-symbolic  ex¬ 
pression  pair  {m{S');  s),  is  an  abstraction  of  the  concrete  evalua¬ 
tion  result,  which  must  be  a  memory-value  pair  {M';  v). 

Theorem  1  (Mix  Soundness) 

1-  If 

E  F  {M ;  e)  ^  r  and 
F  Fa  e  :  r  such  that 

{E-M)^{V-A)  , 

then  0  Fa'  v  :  t  and  M'  ~  A'  for  some  M' ,  v,  A'  such  that 
r  =  (M';  v)  and  A'  D  A. 

2.  If 

E  F  (M;  e)  —>  r  and 

E  F  (S' ;  e)  {S'  ;  s)  such  that 

(£;;M) -Ao-v.a  (E;m(S))  and  l5(S')r  , 

then  r  ~Af,.v'  A'  (m.(S');s)  for  some  V'  A  V  and  some 
Aq  ,  A'  such  that  Ag  A^  3  Aq  >(:  A. 

Proof 

By  simultaneous  induction  on  the  derivations  of  iJ  F  (M ;  e)  ^  r. 
The  proof  is  given  in  Appendix  A. 

This  statement  of  symbolic  execution  soundness  (part  2  in  The¬ 
orem  1)  is  what  we  need  to  show  Mix  sound,  but  at  first  glance, 
it  seems  suspect  because  it  does  not  say  anything  about  symbolic 
execution  being  exhaustive.  However,  if  we  look  at  type  checking  a 
symbolic  block  (i.e.,  rule  TS  YMBlock),  exhaustiveness  is  ensured 
through  the  exhaustive{. . .)  constraint. 

In  particular,  we  can  state  exhaustive  symbolic  execution  as 
a  corollary,  and  the  case  for  TSymBlock  proceeds  in  the  same 
manner  as  this  corollary. 

Corollary  1.1  (Exhaustive  Symbolic  Execution) 

Suppose  E  F  (M;  e)  ^  {M';  v)  and  we  have  n  >  0  symbolic 
executions 

E  F  ((true;  m)  ;  e)  {Si  ;  Si)  such  that 
exhaustive{g{Si), . . . ,  g{Sn))  and 
{E\M)  ~Ao-y-a  (E;m) 

then  {M'-,v)  ~a;,.v"-a'  {m{Si)-,  sf)  for  some  i  £  l..n,  V'  A 
V,  and  some  Ag,  A'  such  that  Ag  *  A'  D  Ag  *  A. 

Here  we  say  that  if  we  have  n  >  0  symbolic  executions  that  each 
start  with  a  path  condition  of  true  and  where  their  resulting  path 
conditions  are  exhaustive  (i.e.,  g{Si)  V  ...  V  g{Sn)  is  a  tautology 
meaning  it  holds  under  any  valuation  V),  then  one  of  those  sym¬ 
bolic  executions  must  match  the  concrete  execution.  Observe  that 
in  this  statement,  there  is  no  premise  on  the  resulting  path  condi¬ 
tion,  but  rather  that  we  start  with  a  initial  path  condition  of  true. 

4.  Mixy:  a  Prototype  of  Mix  for  C 

We  have  developed  MiXY,  a  prototype  tool  for  C  that  uses  Mix 
to  detect  null  pointer  errors.  MiXY  mixes  a  (flow-insensitive)  type 
qualifier  inference  system  with  a  symbolic  executor.  MiXY  is  built 
on  top  of  the  CIL  front-end  for  C  [Necula  et  al.  2002],  and  our 
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type  qualifier  inference  system,  CilQual,  is  essentially  a  simplified 
CIL  reimplementation  of  the  type  qualifier  inference  algorithm 
described  by  Foster  et  al.  [2006].  Our  symbolic  executor,  Otter 
[Reisner  et  al.  2010],  uses  STP  [Ganesh  and  Dill  2007]  as  its  SMT 
solver  and  works  in  a  manner  similar  to  KLEE  [Cadar  et  al.  2008]. 

Type  Qualifiers  and  Null  Pointer  Errors.  For  this  application,  we 
introduce  two  qualifier  annotations  for  pointers:  nonnull  indicates 
that  a  pointer  must  not  be  null,  and  null  indicates  that  a  pointer  may 
be  null.  Our  inference  system  automatically  annotates  uses  of  the 
NULL  macro  with  the  null  qualifier  annotation.  The  type  qualifier 
inference  system  generates  constraints  among  known  qualifiers 
and  unknown  qualifier  variables,  solves  those  constraints,  and  then 
reports  a  warning  if  null  values  may  flow  to  nonnull  positions. 
Thus,  our  type  qualifier  inference  system  ensures  pointers  that  may 
be  null  cannot  be  used  where  non-null  pointers  are  required. 

For  example,  consider  the  following  C  code: 

1  void  free(int  *nonnull  x); 

2  int  >i<id(int  *p)  {  return  p;  } 

3  int  *x  =  NULL; 

4  int  *y  =  id(x); 

5  free(y); 

Flere  on  line  1  we  annotate  free  to  indicate  it  takes  a  nonnull  pointer. 
Then  on  line  3,  we  initialize  x  to  be  NULL,  pass  that  value  through 
id,  and  store  the  result  in  y  on  line  4.  Then  on  line  5  we  call  free 
with  NULL. 

Our  qualifier  inference  system  will  generate  the  following  types 
and  constraints  (with  some  simplifications,  and  ignoring  I-  and  r- 
value  issues): 

free  :  int  *  nonnull  ^  void  x  :  int  */3 

id  :  int  *7  ^  int  *S  y  :  int  *e 

null  =  P  /3  =  7  7  =  (5  5  =  e  e  =  nonnull 

Here  /3,  7,  5,  and  e  are  variables  that  standard  for  unknown  quali¬ 
fiers.  Put  together,  these  constraints  require  null  =  nonnull,  which 
is  not  allowed,  and  hence  qualifier  inference  will  report  an  error  for 
this  program. 

Our  symbolic  executor  also  looks  for  null  pointer  errors.  The 
symbolic  executor  tracks  C  values  at  the  bit  level,  using  a  repre¬ 
sentation  similar  to  KLEE  [Cadar  et  al.  2008].  A  null  pointer  is 
represented  as  the  value  0,  and  the  symbolic  executor  reports  an 
error  if  0  is  ever  dereferenced. 

Typed  and  Symbolic  Blocks.  In  our  formal  system,  we  allow 
typed  and  symbolic  blocks  to  be  introduced  anywhere  in  the 
program.  In  MiXY,  these  blocks  can  only  be  introduced  around 
whole  function  bodies  by  annotating  a  function  as  MIX( typed)  or 
MlX(symbolic),  and  MiXY  switches  between  qualifier  inference  and 
symbolic  execution  at  function  calls.  We  can  simulate  blocks  within 
functions  by  manually  extracting  the  relevant  code  into  a  fresh 
function. 

Skipping  some  details  for  the  moment,  this  switching  process 
works  as  follows.  When  MiXY  is  invoked,  the  programmer  speci¬ 
fies  (as  a  command-line  option)  whether  to  begin  in  a  typed  block  or 
a  symbolic  block.  In  either  case,  we  first  initialize  global  variables 
as  appropriate  for  the  analysis,  and  then  analyze  the  program  start¬ 
ing  with  main.  In  symbolic  execution  mode,  we  begin  simulating 
the  program  at  the  entry  function,  and  at  calls  to  functions  that  are 
either  unmarked  or  are  marked  as  symbolic,  we  continue  symbolic 
execution  into  the  function  body.  At  calls  to  functions  marked  with 
MlX(typed),  we  switch  to  type  inference  starting  with  that  function. 

In  type  inference  mode,  we  begin  analysis  at  the  entry  function 
f,  applying  qualifier  inference  to  f  and  all  functions  reachable  from  f 
in  the  call  graph,  up  to  the  frontier  of  any  functions  that  are  marked 
with  MlX(symbolic).  We  use  CIL’s  built-in  pointer  analysis  to  find 


the  targets  of  calls  through  function  pointers.  Einally,  we  switch 
to  symbolic  execution  for  each  function  marked  MIX(symbolic)  that 
was  discovered  at  the  frontier. 

In  this  section,  we  describe  implementation  details  that  are  not 
captured  by  our  formal  system  from  Section  3: 

•  The  formal  system  Mix  is  based  on  a  type  checking  system 
where  all  types  are  given.  Since  type  qualifier  inference  in¬ 
volves  variables,  we  need  to  handle  variables  that  are  not  yet 
constrained  to  concrete  type  qualifiers  when  transitioning  to  a 
symbolic  block  (Section  4.1). 

•  We  need  to  translate  information  about  aliasing  between  blocks 
(Section  4.2). 

•  Since  the  same  block  or  function  may  be  called  from  multiple 
contexts,  we  need  to  avoid  repeating  analysis  of  the  same  func¬ 
tion  (Section  4.3). 

•  Since  functions  can  contain  blocks  and  be  recursive,  we  need 
to  handle  recursion  between  typed  and  symbolic  blocks  (Sec¬ 
tion  4.4). 

Einally,  we  present  our  initial  experience  with  MiXY  (Section  4.5), 
and  we  discuss  some  limitations  and  future  work  (Section  4.6). 

4.1  Translating  Null/Non-null  and  Type  Variables 

At  transitions  between  typed  and  symbolic  blocks,  we  need  to 
translate  null  and  nonnull  annotations  back  and  forth. 

From  Types  to  Symbolic  Values.  Suppose  local  variable  x  has 
type  int  M^nonnull.  Then  in  the  symbolic  executor,  we  initialize  x 
to  point  to  a  fresh  memory  cell.  If  x  has  type  int  *null,  then  we  ini¬ 
tialize  X  to  be  (a:bool)?/oc:0,  where  a  is  a  fresh  boolean  that  may 
be  either  true  or  false,  loc  is  a  newly  initialized  pointer  (described 
in  Section  4.2),  and  0  represents  null.  Hence  this  expression  means 
X  may  be  either  null  or  non-null,  and  the  symbolic  executor  will  try 
both  possibilities. 

A  more  interesting  case  occurs  if  a  variable  x  has  a  type  with 
a  qualifier  variable  (e.g.,  int  */3 ).  In  this  case,  we  first  try  to  solve 
the  current  set  of  constraints  to  see  whether  has  a  solution  as 
either  null  or  nonnull,  and  if  it  does,  we  perform  the  translation 
given  above.  Otherwise,  if  /3  could  be  either,  we  first  optimistically 
assume  it  is  nonnull. 

We  can  safely  use  this  assumption  when  returning  from  a  typed 
block  to  a  symbolic  block  since  such  a  qualifier  variable  can  only 
be  introduced  when  variables  are  aliased  (e.g.,  via  pointer  assign¬ 
ment),  a  case  that  is  separately  taken  into  account  by  the  MiXY 
memory  model  (Section  4.2). 

However,  if  we  use  this  assumption  when  entering  a  symbolic 
block  from  a  typed  block,  we  may  later  discover  our  assumption 
was  too  optimistic.  For  example,  consider  the  following  code: 

;  {t  int  *x;{s  x  =  NULL;  s};{s  free(x);  s}t} 

In  the  type  system,  x  has  type  int  *  /3 ,  where  initially  P  is  uncon¬ 
strained.  Suppose  that  we  analyze  the  symbolic  block  on  the  right 
before  the  one  on  the  left.  This  scenario  could  happen  because  the 
analysis  of  the  enclosing  typed  block  does  not  model  control-flow 
order  (i.e.,  is  flow  insensitive).  Then  initially,  we  would  think  the 
call  to  free  was  safe  because  we  optimistically  treat  unconstrained 
p  as  nonnull — but  this  is  clearly  not  accurate  here. 

The  solution  is,  as  expected,  to  repeat  our  analyses  until  we 
reach  a  fixed  point.  In  this  case,  after  we  analyze  the  left  symbolic 
block,  we  will  discover  a  new  constraint  on  x,  and  hence  when  we 
iterate  and  reanalyze  the  right  symbolic  block,  we  will  discover  the 
error.  We  are  computing  a  least  fixed  point  because  we  start  with 
optimistic  assumptions — nothing  is  null — and  then  monotonically 
discover  more  expressions  may  be  null. 

From  Symbolic  Values  to  Types.  We  use  the  SMT  solver  to 
discover  the  possible  final  values  of  variables  and  translate  those  to 


the  appropriate  types.  Given  a  variable  x  that  is  mapped  to  symbolic 
expression  s,  we  ask  whether  p  A  (s  =  0)  is  satisfiable  where  g  is 
the  path  condition.  If  the  condition  is  satisfiable,  we  constrain  x  to 
be  null  in  the  type  system.  There  are  no  nonnull  constraints  to  be 
added  since  they  correspond  to  places  in  code  where  pointers  are 
dereferenced,  which  is  not  reflected  in  symbolic  values. 

Thus,  null  pointers  from  symbolic  blocks  will  lead  to  errors 
in  typed  blocks  if  they  flow  to  a  nonnull  position;  whereas  null 
pointers  from  typed  blocks  will  lead  to  errors  in  symbolic  blocks  if 
they  are  dereferenced  symbolically. 

4.2  Aliasing  and  Mixy’s  Memory  Model 

The  formal  system  Mix  defers  all  reasoning  about  aliasing  to  as 
late  of  a  time  as  possible.  As  alluded  to  in  Section  3,  this  choice 
may  be  difficult  to  implement  in  practice  given  limitations  in  the 
constraint  solver.  Thus  in  MiXY,  we  use  a  pre-pass  pointer  analysis 
to  initialize  aliasing  relationships. 

Typed  to  Symbolic  Block.  When  we  switch  from  a  typed  block 
to  a  symbolic  block,  we  initialize  a  fresh  symbolic  memory,  which 
may  include  pointers.  We  use  a  variant  of  the  approach  described  in 
Section  3  that  makes  use  of  aliasing  information  to  be  more  precise. 
Rather  than  modeling  memory  as  one  big  array,  MiXY  models 
memory  as  a  map  from  locations  to  separate  arrays.  Aliasing  within 
arrays  is  modeled  as  in  our  formalism,  and  aliasing  between  arrays 
is  modeled  using  Morris’s  general  axiom  of  assignment  [Bomat 
2000;  Morris  1982]. 

C  also  supports  a  richer  variety  of  types  such  as  arrays  and 
structs,  as  well  as  recursive  data  structures.  MiXY  lazily  initializes 
memory  in  an  incremental  manner  so  that  we  can  sidestep  the  issue 
of  initializing  an  arbitrarily  recursive  data  structure;  MiXY  only 
initializes  as  much  as  is  required  by  the  symbolic  block.  We  use 
CIL’s  pointer  analysis  to  determine  possible  points-to  relationships 
and  initialize  memory  accordingly. 

Symbolic  to  Typed  Block.  An  issue  arises  from  using  type  infer¬ 
ence  when  we  switch  from  a  symbolic  block  to  a  typed  block.  Con¬ 
sider  the  following  code  snippets,  which  are  identical  except  that  y 
points  to  r  on  the  left,  and  y  points  to  x  on  the  right: 


/ /  *y  not  aliased  to  x 
int  *x  =  . . . ; 
int  *r  =  ....  **y  =  &.r', 
{t  //  okay 
X  =  NULL; 
assert_nonnull(*y);  t} 


//  *y  aliased  to  x 
int  *x  =  . . . ; 
int  **y  =  (Six; 

{t  //  should  fall 
x  =  NULL; 
assert_nonnull(*y);  t} 


In  both  cases,  at  entry  to  the  typed  blocks,  x  and  *y  are  assigned 
types  p  ref  and  7  ref  respectively,  based  on  their  current  values. 
Notice,  however,  that  for  the  code  on  the  right,  we  should  also 
have  /3  =  7 .  Otherwise,  after  the  assignment  x  =  NULL,  we  will 
not  know  that  *y  is  also  NULL. 

This  example  illustrates  an  important  difference  between  type 
inference  and  type  checking.  In  type  checking,  this  problem  cannot 
arise  because  every  value  has  a  known  type,  and  we  only  have 
to  check  that  those  types  are  consistent.  However,  type  inference 
actually  has  to  discover  richer  information,  such  as  what  types  must 
be  equal  because  of  aliasing,  in  order  to  find  a  valid  typing. 

One  solution  to  this  problem  would  be  to  translate  aliasing  in¬ 
formation  from  symbolic  execution  to  and  from  type  constraints.  In 
Mixy,  we  use  an  alternative  solution  that  is  easier  to  implement: 
we  use  CIL’s  built-in  may  pointer  analysis  to  conservatively  dis¬ 
cover  points-to  relationships.  When  we  transition  from  a  symbolic 
block  to  a  typed  block,  we  add  constraints  to  require  that  all  may- 
aliased  expressions  have  the  same  type. 


4.3  Caching  Blocks 

In  C,  a  block  or  function  may  be  called  from  many  different  call 
sites,  so  we  may  need  to  analyze  that  block  in  the  context  of 
each  call  site.  Since  it  can  be  quite  costly  to  analyze  that  block 
repeatedly,  we  cache  the  calling  context  and  the  results  of  the 
analysis  for  that  block,  and  we  reuse  the  results  when  the  block 
is  called  again  with  a  compatible  calling  context.  Conceptually, 
caching  is  similar  to  compositional  symbolic  execution  [Godefroid 
2007];  in  MiXY,  we  implement  caching  as  an  extension  to  the 
mix  rules,  using  types  to  summarize  blocks  rather  than  symbolic 
constraints. 

Caching  Symbolic  Blocks.  Before  we  translate  the  types  from  the 
enclosing  typed  block  to  symbolic  values,  we  first  check  to  see 
if  we  have  previously  analyzed  the  same  symbolic  block  with  a 
compatible  calling  context.  We  define  the  calling  context  to  be  the 
types  for  all  variables  that  will  be  translated  into  symbolic  values, 
and  we  say  two  calling  contexts  are  compatible  if  every  variable 
has  the  same  type  in  both  contexts. 

If  we  have  not  analyzed  the  symbolic  block  before  with  a  com¬ 
patible  calling  context,  we  translate  the  types  into  symbolic  values, 
analyze  the  symbolic  block,  and  translate  the  symbolic  values  to 
types  by  adding  type  constraints  as  usual.  At  this  point,  we  will 
cache  the  translated  types  for  this  calling  context;  we  cache  the 
translated  types  instead  of  the  symbolic  values  since  the  translation 
from  symbolic  values  to  types  is  expensive.  Otherwise,  if  we  have 
analyzed  the  symbolic  block  before  with  a  compatible  calling  con¬ 
text,  we  use  the  cached  results  by  adding  null  type  constraints  for 
null  cached  types  in  a  manner  similar  to  translating  symbolic  val¬ 
ues.  Finally,  in  both  cached  and  uncached  cases,  we  restore  aliasing 
relationships  and  return  to  the  enclosing  typed  block  as  usual. 

Caching  Typed  Blocks.  Caching  for  typed  blocks  is  similarly  im¬ 
plemented,  but  with  one  difference:  unlike  above,  we  first  translate 
symbolic  values  into  types,  then  use  the  translated  types  as  the  call¬ 
ing  context,  and  finally  cache  the  final  types  as  the  result  of  analyz¬ 
ing  the  typed  block.  We  could  have  chosen  to  use  symbolic  values 
as  the  calling  context  and  the  result,  but  since  translating  symbolic 
values  to  types  or  comparing  symbolic  values  both  involve  similar 
number  of  calls  to  the  SMT  solver,  we  chose  to  use  types  to  unify 
the  implementation. 

4.4  Recursion  between  Typed  and  Symbolic  Blocks 

A  typed  block  and  a  symbolic  block  may  recursively  call  each 
other,  and  we  found  block  recursion  to  be  surprisingly  common 
in  our  experiments.  Without  special  handling  for  recursion,  MiXY 
will  keep  switching  between  them  indefinitely  since  a  block  is 
analyzed  with  a  fresh  initial  state  upon  every  entry.  Therefore,  we 
need  to  detect  when  recursion  occurs,  either  beginning  with  a  typed 
block  or  a  symbolic  block,  and  handle  it  specially. 

To  handle  recursion,  we  maintain  a  block  stack  to  keep  track  of 
blocks  that  are  currently  being  analyzed.  Similar  to  a  function  call 
stack,  the  block  stack  is  a  stack  of  blocks  and  their  calling  contexts, 
which  are  defined  in  terms  of  types  as  in  caching  (Section  4.3).  We 
push  blocks  onto  the  stack  upon  entry  and  pop  them  upon  return. 

Before  entering  a  block,  we  first  look  for  recursion  by  search¬ 
ing  the  block  stack  for  the  same  block  with  a  compatible  calling 
context.  If  recursion  is  detected,  then  instead  of  entering  the  block, 
we  mark  the  matching  block  on  the  stack  as  recursive  and  return  an 
assumption  about  the  result.  For  the  initial  assumption,  we  use  the 
calling  context  of  the  marked  block,  optimistically  assuming  that 
the  block  has  no  effect.  When  we  eventually  return  to  the  marked 
block,  we  compare  the  assumption  with  the  actual  result  of  analyz¬ 
ing  the  block.  If  the  assumption  is  compatible  with  the  actual  result, 
we  return  the  result;  otherwise,  we  re-analyze  the  block  using  the 
actual  result  as  the  updated  assumption  until  we  reach  a  fixed  point. 
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4.5  Preliminary  Experience 

We  gained  some  initial  experience  with  MiXY  by  running  it  on 
vsftpd-2.0.7  and  looking  for  false  null  pointer  warnings  from 
pure  type  qualifier  inference  that  can  be  eliminated  with  the  addi¬ 
tion  of  symbolic  execution.  Since  MiXY  is  in  the  prototype  stage, 
we  started  small.  Rather  than  annotate  all  dereferences  as  requiring 
nonnull,  we  added  just  one  nonnull  annotation: 

sysutiLfree(void  *  nonnull  p.ptr)  MlX(typed)  {  . . .  } 

The  sysutiLfree  function  wraps  the  free  system  call  and  checks,  at 
run  time,  that  the  pointer  argument  is  not  null.  In  essence,  our  anal¬ 
ysis  tries  to  check  this  property  statically.  We  annotated  sysutiLfree 
itself  with  MlX(typed),  so  MiXY  need  not  symbolically  execute  its 
body — our  annotation  captures  the  important  part  of  its  behavior 
for  our  analysis. 

We  then  ran  MiXY  on  vsf  tpd,  beginning  with  typing  at  the  out¬ 
ermost  level.  We  examined  the  resulting  warnings  and  then  tried 
adding  MIX( symbolic)  annotations  to  eliminate  warnings.  We  suc¬ 
ceeded  in  several  cases,  discussed  next.  We  did  not  fully  examine 
many  of  the  other  cases,  but  Section  4.6  describes  some  prelimi¬ 
nary  observations  about  MiXY  in  practice.  Note  that  the  code  snip¬ 
pets  shown  below  are  abbreviated,  and  many  identifiers  have  been 
shortened.  We  should  also  point  out  that  all  the  examples  below 
eliminate  one  or  more  imprecise  qualifier  flows  from  type  qualifier 
inference;  this  pruning  may  or  may  not  suppress  a  given  warning, 
depending  on  whether  other  flows  could  produce  the  same  warning. 

Case  1:  Flow  and  path  insensitivity  in  sockaddr.clear 

1  void  sockaddr_clear(struct  sockaddr  **p_sock)  MIX(symbolic)  { 

2  if  (*p_sock  !=  NULL)  { 

3  sysutiLfree(*p_sock); 

4  *p_sock  =  NULL; 

^  } 

^  } 

This  function  is  implicated  in  a  false  warning:  due  to  flow  insen¬ 
sitivity  in  the  type  system,  the  null  assignment  on  line  4  flows  to 
the  argument  to  sysutiLfree  on  line  3,  even  though  the  assignment 
occurs  after  the  call.  Also,  the  type  system  ignores  the  null  check 
on  line  2  due  to  path  insensitivity. 

Marking  sockaddr.clear  with  MIX(symbolic)  successfully  resolves 
this  warning:  the  symbolic  executor  determines  that  *p_sock  is  not 
null  when  used  as  an  argument  to  sysutiLfree(). 

Case  2:  Path  and  context  insensitivity  in  str_next_dirent 

1  void  str_alloc_text(struct  mystr*  P-Str)  MlX(typed)', 

2  const  char*  sysutiLnext_dirent( . . .)  MlX(typed)  { 

3  if  (p_dirent  ==  NULL)  return  NULL; 

^  } 

5  void  str_next_dirent( . . .)  MlX(symbolic)  { 

6  const  char*  p_filename  =  sysutiLnext_dirent( . . .); 

7  if  (p.filename  !=  NULL) 

5  str_alloc_text(p_filename); 

« } 

10  . . .  str_alloc_text(str);  sysutiLfree(str);  ... 

In  this  example,  the  function  str_next_direct  calls  sysutiLnext_dirent 
on  line  6,  which  may  return  a  null  value.  Hence  pTilename  may  be 
null.  The  type  system  ignores  the  null  check  on  line  7  and  due  to 
context  insensitivity,  conflates  pTilename  with  other  variables,  such 
as  str,  that  are  passed  to  str_alloc_text  (lines  8  and  10).  Hence  the 
type  system  believes  str  may  be  null.  However,  str  is  used  as  an 
argument  to  sysutiLfree  (line  10),  which  leads  the  type  system  to 
report  a  false  warning. 

Annotating  function  str_next_dirent  as  symbolic,  while  leaving 
sysutiLnext_dirent  and  str_allocText  as  typed,  successfully  elim¬ 
inates  this  warning:  the  symbolic  executor  correctly  determines 


that  pTilename  is  not  null  when  it  is  used  as  an  argument  to 
str_alloc_text.  And  although  the  extra  precision  does  not  matter 
in  this  particular  example,  notice  that  the  call  on  line  8  will  be  an¬ 
alyzed  in  a  separate  invocation  of  the  type  system  than  the  call  on 
line  10,  thus  introducing  some  context  sensitivity. 

Case  3:  Flow  and  path  insensitivity  in  dns_resolve  and  main 

1  void  main_BLOCK(struct  sockaddr**  p_sock)  MlX(symbolic)  { 

2  *p_sock  =  NULL; 

3  dns_resolve(p_sock,  tunable.pasv .address); 

^  } 

5  int  main( . .  • )  { 

6  . . .  main_BLOCK(&'p_addr);  . . . ;  sysutiLfree(p_addr):  ... 

7 } 

s  void  dns_resolve(struct  sockaddr**  p_sock, 

9  const  char*  p.name)  { 

10  struct  hostent*  hent  =  gethostbyname(p_name); 

11  sockaddr_clear(p_sock); 

12  if  (hent— fi.addrtype  ==  AF.INET) 

13  sockaddr_allocJpv4(p_sock); 

14  else  if  (hent— >h_addrtype  ==  AF_INET6) 

15  sockaddr_allocJpv6(p_sock); 

16  else 

17  die("gethostbyname();_neither„IPv4.nor.lPv6" ); 

IS  } 

There  are  two  sources  of  null  values  in  the  code  above:  *p_sock 
is  set  to  null  on  line  2;  and  sockaddr.clear,  which  was  previously 
marked  as  symbolic  in  Case  1  above,  also  sets  *p.sock  to  null  on 
line  11  in  dns.resolve.  Due  to  flow  insensitivity  in  the  type  system, 
both  these  null  values  eventually  reach  sysutiLfree  on  line  6,  leading 
to  false  warnings. 

However,  we  can  see  that  these  null  values  are  actually  overwrit¬ 
ten  by  non-null  values  on  lines  13  and  15,  where  sockaddr.alloc.ipv4 
or  sockaddr_alloc_ipv6  allocates  the  appropriate  structure  and  as¬ 
signs  it  to  *p.sock  (not  shown).  We  can  eliminate  these  warnings 
by  extracting  the  code  in  main  that  includes  both  null  sources  into 
a  symbolic  block. 

Also,  there  is  a  system  call  gethostbyname  on  line  10  that  we 
need  to  handle.  Here,  we  define  a  well-behaved,  symbolic  model 
of  gethostbyname  that  returns  only  AF.INET  and  AF_INET6  as  is 
standard  (not  shown).  This  will  cause  the  symbolic  executor  to  skip 
the  last  branch  on  line  17,  which  we  need  to  do  because  we  cannot 
analyze  die  symbolically  as  it  eventually  calls  a  function  pointer,  an 
operation  that  our  symbolic  executor  currently  has  limited  support 
for.  We  also  cannot  put  gethostbyname  or  die  in  typed  blocks  in  this 
case,  since  *p.sock  is  null  and  will  result  in  false  warnings. 

Case  4:  Helping  symbolic  execution  with  symbolic  function  point¬ 
ers 

1  void  sysutiLexit.BLOCK(void)  MlX(typed)  { 

2  if  (s.exit.func)  (*s_exitTunc)(); 

^  } 

4  void  sysutiLexit(int  exit.code)  { 

5  sysutiLexit_BLOCK(): 

6  exit(exit_code): 

^ } 

In  several  instances,  we  would  like  to  evaluate  symbolic  blocks 
that  call  sysutiLexit,  defined  on  line  4,  which  in  turn  calls  exit  to 
terminate  the  program.  However,  before  terminating  the  program, 
sysutiLexit  calls  the  function  pointer  s.exit.func  on  line  2.  Our  sym¬ 
bolic  executor  does  not  support  calling  symbolic  function  pointers 
(i.e.,  which  targets  are  unknown),  so  instead,  we  extract  the  call  to 
s.exit.func  into  a  typed  block  to  analyze  the  call  conservatively. 
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4.6  Discussion  and  Future  Work 

Our  preliminary  experience  provides  some  real-world  validation 
of  Mix’s  efficacy  in  removing  false  positives.  However,  there  are 
several  limitations  to  be  addressed  in  future  work. 

Most  importantly,  the  overwhelming  source  of  issues  in  MiXY 
is  its  coarse  treatment  of  aliasing,  which  relies  on  an  imprecise 
pointer  analysis.  One  immediate  consequence  is  that  it  impedes  per¬ 
formance  in  the  symbolic  executor:  if  an  imprecise  pointer  analysis 
returns  large  points-to  sets  for  pointers,  translating  symbolic  point¬ 
ers  to  type  constraints  becomes  slow  because  we  first  need  to  check 
if  each  pointer  target  is  valid  in  the  current  path  condition  by  call¬ 
ing  the  SMT  solver,  then  determine  if  any  valid  targets  may  be  null. 
This  leads  to  a  significant  slowdown:  our  small  examples  from  Sec¬ 
tion  4.5  take  less  than  a  second  to  run  without  symbolic  blocks,  but 
from  5  to  25  seconds  to  run  with  one  symbolic  block,  and  about 
60  seconds  with  two  symbolic  blocks.  This  issue  is  further  com¬ 
pounded  by  the  fixed-point  computation  that  repeatedly  analyzes 
symbolic  blocks  nested  in  typed  blocks  or  for  handling  recursion. 

We  also  noticed  several  cases  in  vsf  tpd  where  calls  to  symbolic 
blocks  would  help  introduce  context  sensitivity  to  distinguish  calls 
to  malloc.  However,  since  we  rely  on  a  context-insensitive  pointer 
analysis  to  restore  aliasing  relationships  when  switching  to  typed 
blocks,  these  calls  will  again  be  conflated.  The  issue  especially  af¬ 
fects  the  analysis  of  typed-to-symbolic-to-typed  recursive  blocks 
because  the  nested  typed  blocks  are  polluted  by  aliasing  relation¬ 
ships  from  the  entire  program.  A  similar  issue  occurs  with  symbolic 
blocks,  as  pointers  are  initialized  to  point  to  targets  from  the  entire 
program,  rather  than  being  limited  to  the  enclosing  context. 

Just  as  in  the  formalism,  MiXY  has  to  consider  the  entire  mem¬ 
ory  when  switching  from  typed  to  symbolic  or  vice-versa.  Since 
this  was  a  deliberate  design  decision,  we  were  not  surprised  to  find 
out  that  this  has  an  impact  on  performance  and  leads  to  many  limi¬ 
tations  in  practice.  Any  temporary  violation  of  type  invariants  from 
symbolic  blocks  would  immediately  be  flagged  when  switching  to 
typed  blocks,  even  if  they  have  no  effect  on  the  code  in  the  typed 
blocks.  In  the  other  direction,  symbolic  blocks  are  forced  to  start 
with  a  fresh  memory  when  switching  from  typed  blocks  even  if 
there  were  no  effects. 

Ultimately,  we  believe  that  these  issues  can  be  addressed  with 
more  precise  information  about  aliasing  as  well  as  effects,  perhaps 
extracted  directly  from  the  type  inference  constraints  and  symbolic 
execution. 

In  addition  to  checking  for  null  pointer  errors,  we  plan  to  ex¬ 
tend  Mixy  to  check  other  properties,  such  as  data  races,  and  to 
mix  other  types  of  analysis  together.  We  also  plan  to  investigate  au¬ 
tomatic  placement  of  type/symbolic  blocks,  i.e.,  essentially  using 
Mix  as  an  intermediate  language  for  combining  analyses.  One  idea 
is  to  begin  with  just  typed  blocks  and  then  incrementally  add  sym¬ 
bolic  blocks  to  refine  the  result.  This  approach  resembles  abstrac¬ 
tion  refinement  (e.g..  Ball  and  Rajamani  [2002];  Henzinger  et  al. 
[2004]),  except  the  refinement  can  be  obtained  using  completely 
different  analyses  instead  of  one  particular  family  of  abstractions. 

5.  Related  Work 

There  are  several  threads  of  related  work.  There  have  been  numer¬ 
ous  proposals  for  static  analyses  based  on  type  systems;  see  Pals- 
berg  and  Millstein  [2008]  for  pointers.  Symbolic  execution  was  first 
proposed  by  King  [1976]  as  an  enhanced  testing  strategy,  but  was 
difficult  to  apply  for  many  years.  Recently,  SMT  solvers  have  be¬ 
come  very  powerful,  making  symbolic  execution  much  more  at¬ 
tractive  as  even  very  complex  path  conditions  can  be  solved  sur¬ 
prisingly  fast.  There  have  been  many  recent,  impressive  results  us¬ 
ing  symbolic  execution  for  bug  finding  [Cadar  et  al.  2006,  2008; 
Godefroid  et  al.  2005;  Sen  et  al.  2005].  These  systems  use  symbolic 


execution  to  explore  a  small  subset  of  the  possible  program  paths, 
since  in  the  presence  of  loops  with  symbolic  bounds,  pure  symbolic 
execution  will  not  terminate  in  a  reasonable  amount  of  time  (unless 
loop  invariants  are  assumed).  In  the  Mix  formalism,  in  contrast,  we 
use  symbolic  execution  in  a  sound  manner  by  exploring  all  paths, 
which  is  possible  because  we  can  use  type  checking  on  parts  of  the 
code  where  symbolic  execution  takes  too  long.  Of  course,  it  is  also 
possible  to  mix  unsound  symbolic  execution  with  type  checking,  to 
gain  whatever  level  of  assurance  the  user  desires. 

There  are  several  static  analyses  that  can  operate  at  different  lev¬ 
els  of  abstraction.  Bandera  [Corbett  et  al.  2000]  is  a  model  check¬ 
ing  system  that  uses  abstraction-based  program  specialization,  in 
which  the  user  specifies  the  exact  abstractions  to  use.  System  Z 
is  an  abstract  interpreter  generator  in  which  the  user  can  tune  the 
level  of  abstraction  to  trade  off  cost  and  precision  [Yi  and  Harri¬ 
son  1993].  Tuning  these  systems  requires  a  deep  knowledge  of  pro¬ 
gram  analysis.  In  contrast,  we  believe  that  Mix’s  tradeoff  is  eas¬ 
ier  to  understand — one  selects  between  essentially  no  abstraction 
(symbolic  execution),  or  abstraction  in  terms  of  types,  which  are 
arguably  the  most  successful,  well-understood  static  analysis. 

Mix  bears  some  resemblance  to  static  analysis  based  on  ab¬ 
straction  refinement,  such  as  SLAM  [Ball  and  Rajamani  2002], 
BLAST  [Henzinger  et  al.  2004],  and  client-driven  pointer  analy¬ 
sis  [Guyer  and  Lin  2005].  These  tools  incrementally  refine  their 
abstraction  of  the  program  as  necessary  for  analysis.  Adding  sym¬ 
bolic  blocks  to  a  program  can  be  seen  as  introducing  a  very  precise 
“refinement”  of  the  program  abstraction. 

There  are  a  few  systems  that  combine  type  checking  or  infer¬ 
ence  with  other  analyses.  Dependent  types  provide  an  elegant  way 
to  augment  standard  type  with  very  rich  type  refinements  [Xi  and 
Pfenning  1999].  Liquid  types  combines  Hindley-Milner  style  type 
inference  with  predicate  abstraction  [Rondon  et  al.  2008,  2010]. 
Hybrid  types  combines  static  typing,  theorem  proving,  and  dy¬ 
namic  typing  [Flanagan  2006].  All  of  these  systems  combine  types 
with  refinements  at  a  deep  level — the  refinements  are  placed  “on 
top  of”  the  type  structure.  In  contrast.  Mix  uses  a  much  coarser 
approach  in  which  the  precise  analysis  is  almost  entirely  separated 
from  the  type  system,  except  for  a  thin  interface  between  the  two 
systems. 

Many  others  have  considered  the  problem  of  combining  pro¬ 
gram  analyses.  A  reduced  product  in  abstract  interpretation  [Cousot 
and  Cousot  1979]  is  a  theoretical  description  of  the  most  precise 
combination  of  two  abstract  domains.  It  is  typically  obtained  via 
manually  defined  reduction  operators  that  depend  on  the  domains 
being  combined.  Another  example  of  combining  abstract  domains 
is  the  logical  product  of  Gulwani  and  Tiwari  [2006].  Combining 
program  analyses  for  compiler  optimizations  is  also  well-studied 
(e.g.,  Lerner  et  al.  [2002]).  In  all  of  these  cases,  the  combinations 
strengthen  the  kinds  of  derivable  facts  over  the  entire  program. 
With  Mix,  we  instead  analyze  separate  parts  of  the  program  with 
different  analyses.  Finally,  Mix  was  partially  inspired  by  Nelson- 
Oppen  style  cooperating  decision  procedures  [Nelson  and  Oppen 
1979].  One  important  feature  of  the  Nelson-Oppen  framework  is 
that  it  provides  an  automatic  method  for  distributing  the  appropri¬ 
ate  formula  fragments  to  each  solver  (if  that  the  solvers  match  cer¬ 
tain  criteria).  Clearly  Mix  is  targeted  at  solving  a  very  different 
problem,  but  it  would  be  an  interesting  direction  for  future  work  to 
try  to  extend  Mix  into  a  similar  framework  that  can  automatically 
integrate  analyses  that  have  appropriately  structured  interfaces. 

6.  Conclusion 

We  presented  Mix,  a  new  approach  for  mixing  type  checking  and 
symbolic  execution  to  trade  off  efficiency  and  precision.  The  key 
feature  of  our  approach  is  that  the  mixed  systems  are  essentially 
completely  independent,  and  they  are  used  in  an  off-the-shelf  man- 
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ner.  Only  at  the  boundaries  between  typed  blocks — which  the  user 
inserts  to  indicate  where  type  checking  should  be  used — and  sym¬ 
bolic  blocks — the  symbolic  checking  annotation — do  we  invoke 
special  mix  rules  to  translate  information  between  the  two  sys¬ 
tems.  We  proved  that  Mix  is  sound  (which  implies  that  type  check¬ 
ing  and  symbolic  execution  are  also  independently  sound).  We 
also  described  a  preliminary  implementation,  MiXY,  which  per¬ 
forms  null/non-null  type  qualifier  inference  for  C.  We  identified 
several  cases  in  which  symbolic  execution  could  eliminate  false 
positives  from  type  inference.  In  sum,  we  believe  that  Mix  provides 
a  promising  new  approach  to  trade  off  precision  and  efficiency  in 
static  analysis. 
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A.  Soundness 

To  show  the  soundness  of  Mix,  we  consider  a  standard  big-step 
operational  semantics  for  our  simple  language  of  expressions  (Sec¬ 
tion  A.l).  We  then  show  the  soundness  of  type  checking  and  sym¬ 
bolic  execution  separately  (Section  A.2  and  Section  A.3,  respec¬ 
tively),  which  will  provide  the  basis  for  Mix  soundness  (Sec¬ 
tion  A.4).  This  section  is  an  expansion  of  Section  3.3. 

Type  soundness  is  entirely  standard.  For  symbolic  execution 
soundness,  we  consider  two  notions  in  turn:  when  an  execution 
is  sound  with  respect  to  a  path  of  exploration  and  when  a  set 
of  executions  cover  all  possible  concrete  execution  paths.  One 
challenge  for  showing  Mix  soundness  is  that  symbolic  execution 
allows  a  temporary  violation  of  the  type  invariant  on  memory  used 
by  the  type  checker,  which  must  be  restored  before  entering  a  type 
checking  phase. 

A.l  Operational  Semantics 

Figure  5  gives  a  big-step  operational  semantics  for  our  language 
of  expressions  e.  In  these  rules,  M  is  a  concrete  memory  that  maps 
locations  I  to  values  v.  We  make  this  a  mapping,  rather  than  a  list,  to 
reflect  the  fact  that  the  memory  is  actually  updated  on  the  running 
system.  The  notation  M[l  n]  indicates  an  update  or  extension 
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Type  Checking. 


r  h  e  :  T 


Big-Step  Operational  Semantics. 

SVar 

E,x  :v\-  {M;x}  {M;v} 


E  h  (M;e)  ^  (M'-v) 

SVal 

E  h  (M-v)  {M-,v) 


SPlus 

Eh{M-,ei)  ^  (Mi;e2)  ^  (M2;n2) 

£  h  (M;  ei  +  62)  ^  (M2;  ni  +  712) 


SEq 

Eh  (M-ei)  ^  (Mi;7;i)  (Mi;e2)  ^  (M2;  772) 

i?  h  (M;  ei  =  62)  ^  (M2;  ui  =  112) 


TVar  tint  TBool 

r,a::Tl-a;:r  ri-n:int  Fhfe:  bool 


TPlus 

r  h  ei  :  int  F  h  62  :  int 

F  h  ei  +  62  :  int 

TNot  TAnd 

F  h  e  :  bool  F  h  ei 

FI - :  bool 


TEq 

F  h  61  :  r  F  h  62  :  T 
F  h  61  =  62  :  bool 


bool  F  h  62  :  bool 
F  h  61  A  62  :  bool 


TIf 

F  h  61  :  bool  F  h  62  :  T  F  h  63  :  r 
F  h  if  61  then  62  else  63  :  r 


SNot 

Sh  (M;6i)  ^  (Mi;&i) 

Eh  {M-^ei)  ^  (Mi;-n6i) 


TLet 

F  h  61  :  Ti  F,  a;  ;  Ti  h  62  :  T2 
F  h  let  a;  =  61  in  62  :  T2 


TRef 

F  h  6  :  r 


F  h  ref  6  :  r  ref 


sand 

(M;6i)  ^  (Mi;bi)  £;h(Mi;62)^ 
E  h  (M;  61  A  62)  ^  (M2;  61  A  &2) 


(M2;  62) 


TAssign  TDeref 

F  h  61  :  r  ref  F  h  62  :  t  F  h  6  :  r  ref 


F  h  61  :=  62  :  r 


F  h  16  :  r 


SIf-True 

Eh  (M;6i)  ^  (Mi; true)  Eh  {Mi; €2}  {M2;v} 

E  h  (M;  if  61  then  62  else  63)  ^  (M2;  v) 

SIe-Ealse 

Eh  {M;ei)  ^  (Mi;  false)  Eh  (Mi;  63)  ^  {M-i;v) 
E  h  (M;  if  61  then  62  else  63)  ^  (M3;  v) 

SLet 

Eh  {M;ei)  ^  (Mi;wi) 

E,x  :  vih  {Mi;  62}  {M2;V2) 

E  h  (M;  let  a;  =  61  in  62)  ^  (M2;  V2) 

SRee 

£  h  (M;6i)  ^  (Mi;vi)  I  ^  dom{Mi) 

Eh  (M;  ref  61)  ^  {Mi[l  ^  vi];l) 

SAssign 

Eh  {M;ei)  ^  (Mi;Z) 

Eh  {Mi;  62)  (M2; -62)  I  €  dom{M2) 

E  h  (M;  61  :=  62)  ^  (M2[Z  W2];  V2) 

SDeref 

-E  h  (M;  61)  ^  (Mi;  h)  h  G  dom{Mi) 

Eh  (M;!ei)  ^  (Mi;Mi(Zi)) 

SSymBlock 

Eh  (M;6i)  ^  (Mi;vi) 

Eh  (M;{s  61  s})  ^  (Mi;wi) 

STypBlock 

Eh  (M;6i)  ^  (Mi;t;i) 

Eh  (M;{t  61  t})  ^  (Mi;t;i) 

Figure  5.  Standard  big-step  operational  semantics. 


Figure  6.  Standard  type  checking  rules. 

of  M  so  that  I  V  (depending  on  whether  or  not  I  €  dom{M), 
respectively).  We  consider  the  set  of  locations  to  be  included  in  the 
set  of  values.  The  basic  form  of  the  evaluation  judgment 

E  h  {M;e)  {M';v) 

says  that  in  a  concrete  environment  E,  an  initial  memory  M  and 
an  expression  e  evaluate  to  a  resulting  memory  M'  and  a  value  v. 
A  concrete  environment  maps  variables  to  values  (i.e.,  we  define 
E  0  I  E,  a;  :  w).  To  indicate  boolean  values,  we  use  the  meta¬ 
variable  b  (i.e.,  we  let  b  false  |  true).  To  reiterate  and  make  it 
explicit,  we  work  with  following  semantic  domains: 


V 

e 

Val 

concrete  values 

x 

e 

Var 

program  variables 

1 

e 

Loc 

C  Val 

memory  locations 

E 

Var  - 

-^fin  Val  =  Env 

concrete  environments 

M 

Loc  - 

Val  =  Mem 

concrete  memories 

In  addition  to  the  rules  shown  in  Figure  5,  there  are  also  rules 
that  produce  error  when  none  of  these  rules  apply,  which  is  also 
standard.  We  define 

r  ::=  {M;v)  \  error 

as  the  result  of  an  execution,  so  the  general  form  of  our  execution 
judgment  is  as  follows: 

E  h  (M;e)  ^  r 

For  example,  we  have  the  following  error  rule  for  ^6: 

SNot-Error 

Eh(M;6i)^ri  n  jb  {Mi;bi) 

E  h  (M;  -'61)  ^  error 


13 


A.2  Type  Soundness 

Figure  6  shows  the  type  checking  rules  for  our  language.  Again, 


these  rules  are  entirely  standard.  As  usual,  we  introduce  a  memory 
type  environment  A  that  maps  locations  to  types,  and  we  update  the 
typing  judgment  to  carry  this  additional  environment: 

r  l-A  e  :  r 

Because  A  is  constant  in  all  rules,  we  elide  it  except  where  needed 
for  clarity  in  presentation.  The  memory  type  environment  A  is  used 
to  assign  types  to  locations  l\  specifically,  we  add  the  following 
typing  rule: 

TLoc 

r  l-A  f  :  A(0  ref 

We  write  A'  D  A  if  A'  is  an  extension  of  A.  Formally,  A'  3  A  if 

dom{A')  D  dom{A)  and  A'{1)  =  A{1)  for  all  I  £  dom{A)  . 

We  also  use  the  same  notation  for  other  mapping  extensions. 

To  prove  type  soundness,  we  define  a  soundness  relation  that 
says  that  the  values  in  the  concrete  environment  and  concrete  mem¬ 
ory  are  consistent  with  the  type  environments  (Definition  2). 

Definition  2  (Type  State  Soundness  Relation) 

We  define  soundness  relations  for  type  environments  as  follows: 

{E-M)~{V-A)  if 

E  ~  (T;  A)  and  M  ~  A 

E  ~  (T;  A)  if 

0  Fa  E{x)  :  r(a;)  for  all  x  €  dom{E)  =  dom{r) 

M  ~  A  if 

0  Fa  Mil)  :  A{1)  for  all  I  £  domiM)  =  dom{A) 

The  following  theorem  states  type  soundness  and  is  proved  only 
for  the  rules  in  Figure  6  (i.e.,  everything  excluding  the  nested  block 
rule  in  Figure  4). 

Theorem  3  (Type  Soundness) 

If 

E  F  (M ;  e)  ^  r  and 
F  Fa  e  :  r  such  that 

(S;M)~(r;A)  , 

then  0  Fa'  v  :  r  and  M'  ~  A'  for  some  M',  v,  and  A'  such  that 
r  =  (M';  v)  and  A'  D  A. 

Proof 

By  induction  on  the  derivation  of  iJ  F  {M ;e}^r. 

Notice  that  above  theorem  says  that  r,  the  result  of  evaluating  a 
well-typed  expression,  cannot  be  error. 

A.3  Symbolic  Execution  Soundness 

Intuitively,  symbolic  execution  begins  with  symbolic  values  a  for 
unknown  inputs  and  accumulates  a  symbolic  expression  s  that 
represents  the  result  of  the  program  along  a  path.  Thus,  at  a  high- 
level,  a  symbolic  execution  is  sound  along  a  path  if  interpreting 
the  symbolic  result  under  an  assignment  to  the  symbolic  values 
(i.e.,  the  unknowns)  yields  the  same  value  as  the  concrete  execution 
along  that  path. 

To  capture  this  assignment  to  symbolic  values,  or  valuation,  we 
write 

V  :  SymVal  Val  U  Mem 

for  a  finite  mapping  from  symbolic  values  a  (drawn  from  set 
SymVal)  to  concrete  values  v  (from  Val)  or  concrete  memories 


Concretization. 

lEj^=E  Imr^M 

lu-.rf 

'  =  pr 

N" 

'  =  Via) 

^  def 

=  V 

|[Mi:int  +  «2:int|'^  =  +  IIm2I'^ 

Isi  =  S2I'' 

^  =  pir  =  p2r 

hsf 

II51  AP2I'' 

lm[u:rtef]r  " 

^  =  VM 

|[m,  (si  ^  S2)l'' 

^  =  Imr  [lair  ^  Is2r] 

Im,  (si  4  S2)l'' 

^  ^  Ia2r] 

Pf 

'  =  0 

IE,a;:sf 

Figure  7.  Interpretation  of  symbolic  expressions,  symbolic  mem¬ 
ories,  and  symbolic  environments. 


M  (from  Mem).  Recall  that  a  concrete  environment 
E  :  Var  ^fi„  Val 

is  a  finite  mapping  from  variables  to  values,  and  a  concrete  memory 

M  :  Loc  ^fin  Val 

is  a  finite  mapping  from  locations  to  values. 

Given  a  valuation  V ,  the  interpretation  of  symbolic  expressions, 
symbolic  memories,  and  symbolic  environments  is  largely  as  ex¬ 
pected,  which  we  define  in  Figure  7.  We  write  the  following  for 
these  interpretations: 

Isr=u  Imf^M  p:V  =  E 

The  interpretation  of  symbolic  expressions  ignores  the  type  annota¬ 
tion,  though  would  be  ill-defined  if  the  symbolic  expressions  were 
not  well-typed  with  respect  to  the  valuation  V. 

Our  typed  symbolic  execution  tracks  the  type  of  symbolic  ex¬ 
pressions  (i.e.,  the  type  of  the  value  under  any  valuation  that  re¬ 
spects  the  types  of  the  unknowns)  and  halts  upon  encountering  ill- 
typed  expressions.  Observe  that  this  behavior  matches  the  concrete 
executor  (i.e.,  the  big-step  operational  semantics  in  Figure  5).  This 
typing  of  symbolic  expressions  must  be  with  respect  to  a  memory 
typing  A  and  thus  must  be  part  of  the  soundness  relation.  More¬ 
over,  the  symbolic  executor  distinguishes  between  the  locations  in 
the  arbitrary  memory  on  entry  and  the  locations  that  come  from  al¬ 
locations  during  execution,  which  must  be  reflected  in  the  symbolic 
state  soundness  relation. 

Definition  4  (Symboiic  State  Soundness  Reiation) 

We  define  soundness  relations  for  environment-memory  pairs  and 
memory-value  pairs  using  the  concretization  defined  above,  which 
are  parametrized  by  a  valuation  V,  a  memory  typing  for  the  arbi¬ 
trary  memory  on  entry  Aq,  and  the  memory  typing  for  locations 
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allocated  during  symbolic  execution  A. 

(i7;  M)  ~Ao-V'A  (S;  m)  if 

|[S]1'^  =  S  and  E  ~  (F;  Aq  *  A)  and 
h  E  :  r  for  some  F  and 

=  M  and  Ao  \-v  m  :  Aq  *  A 

{M;v}  ~Ao-v- A  {m;u:T)  if 

=  M  and  Aq  \-v  m  :  Aq  *  A  and 
=v  and  0  Fao^a  tt  :  r 

where  we  write  Ai  *  A2  to  mean  the  memory  typing  that  is  the 
union  of  submemory  typings  Ai  and  A2  with  disjoint  domains. 
There  are  several  pieces  to  these  definitions: 

1.  The  interpretation  of  a  symbolic  part  under  the  valuation  V 

yields  the  concrete  part  (e.g.,  =  E). 

2.  Under  the  valuation  V,  all  symbolic  expressions  (e.g.,  u:r) 
correspond  to  values  that  are  well-typed  in  a  concrete  memory 
with  typing  Aq  >1=  A  (e.g.,  0  Faq^a  llM:'rl'^  :  r). 

To  check  this  second  piece  for  writes  and  allocations  in  m,  the 
soundness  relations  depend  on  the  following  auxiliary  Judgment  on 
symbolic  memories: 

A  hv-  m  :  A' 

The  above  judgment  says  that  under  a  valuation  V,  a  symbolic 
memory  m  corresponds  to  a  concrete  memory  whose  location 
types  are  given  by  A'  and  all  symbolic  expressions  that  record 
writes  and  allocations  are  well-typed  according  to  A'.  The  memory 
typing  A  gives  a  hypothesis  on  the  location  types  of  the  initial 
memory  fi.  In  summary,  we  can  show  that  if  A  hv  rn  :  A',  then 
dom{A')  —  dom{\m\^).  The  rules  that  define  the  above  judgment 
are  as  follows: 

MTHyp 
V{pi)  ~  Aq 

Aq  Fv  p  :  Ao 

MTUpdate 

Ao  Fv  m  :  A  0  Fa  |Mi:ri]]'^  :  n  0  Fa  |[m2:t21'^  :  T2 
Aq  Fv  m,  (ui'.Ti  U2-T2)  :  A 

MTAlloc 

Ao  Fv  m  :  A  0  Fa  |[M:'r|'^  :  r  V{a)  ^  dom{\mf' ) 

Ao  Fv  m,  {a:T  ref  A  u:t)  :  (A,  V (a)  :  r) 

Rule  MTHyp  asserts  that  the  concrete  initial  memory  given  by 
V{fi)  is  typed  by  Aq.  Then,  rules  MTUpdate  and  MTAlloc 
check  that  the  symbolic  expressions  in  the  update-allocation  log  are 
well-typed,  that  is,  the  concretization  of  each  symbolic  expression 
has  the  type  claimed  by  its  typing  annotation  (e.g.,  observe  n 
in  the  premise  0  Fa  |[Mi:ri]]'^  :  n  of  MTUpdate).  However, 
this  judgment  does  not  ensure  that  writes  are  well-typed  (i.e.,  in 
MTUpdate,  types  n  and  T2  do  not  need  to  be  compatible),  as 
symbolic  execution  allows  temporarily  type-inconsistent  memory. 

Temporarily  Type-Inconsistent  Memory.  A  property  of  symbolic 
execution  is  that  it  does  not  require  all  writes  to  be  consistent  with 
the  memory  typing  (i.e.,  Aq  *  A),  as  long  as  the  consistency  is 
restored  before  assigning  types  to  expressions  that  use  memory. 
This  temporary  violation  of  the  memory  typing  invariant  allows  us 
to  obtain,  for  example,  flow-sensitive  type  checking.  The  memory 
type  consistency  judgment  is  a  validation  that  the  memory  typing 
invariant  gets  restored,  and  its  soundness  is  stated  as  follows: 


Lemma  5  (Memory  Type  Consistency  Soundness) 

If\-  m  okU  and  Aq  Fy  m  :  A,  then 

0  Fa  |[m]]'^(()  :  A(Z)  for  all  I  £  dom{A)\L 

where 

U 

si^S2^U 

Aj  a  corollary,  if 

F  m  ok  and  Aq  Fy  m  :  A  , 
then  |[m|^  ~  A. 

Proof 

By  induction  on  the  derivation  of  F  m  ok  U. 

Informally,  the  above  says  that  if  we  have  a  symbolic  mem¬ 
ory  m  that  we  have  checked  for  consistency  (i.e.,  F  m  ok  U),  a 
memory  typing  invariant  A,  and  a  valuation  V ,  then  the  concrete 
memory  under  the  valuation  V  is  consistent  with  A  at  all  locations 
except  perhaps  those  in  the  potentially  inconsistent  set  U.  When 
U  is  empty,  then  we  know  we  have  restored  the  memory  typing 
invariant. 

Soundness  of  Symbolic  Execution  along  a  Path.  For  the  proof 
of  symbolic  execution  soundness,  we  need  a  technical  lemma — 
Lemma  6  (Path  Condition  Prefix) — that  says  that  the  executor  only 
adds  constraints  to  the  path  condition  g{S)  (i.e.,  the  path  condition 
becomes  stronger  monotonically). 

Lemma  6  (Path  Condition  Prefix) 

//S  F  (S  ;  e)  ^  ;  s)  and  l5(5')r.  then  IgjSjr. 

Proof 

By  induction  on  the  derivation  of  E  F  (S  ;  e)  {S'  ;  s). 

Finally,  we  can  show  the  soundness  of  symbolic  execution  along 
a  path.  Observe  the  key  premise  |[(/(*S'')]]  which  says  that  the  path 
condition  accumulated  during  symbolic  execution  holds  under  this 
valuation.  This  constraint  allows  us  to  state  that  the  given  concrete 
and  symbolic  executions  follow  the  same  path.  The  following  the¬ 
orem  is  proven  only  for  the  rules  in  Figure  2  and  Figure  3  (i.e., 
everything  excluding  the  nested  block  rule  in  Figure  4). 

Theorem  7  (Symbolic  Execution  Soundness) 

If 

E  F  (M;  e)  ^  r  and 

E  F  (S' ;  e)  {S'  ;  s)  such  that 

(E;M) -Ao-y.a  (E;m(S))  and  Ifl(S')r  - 
then 

r  ^Afv'  A'  {m{S');s} 

for  some  V'  A  V  and  some  Aq  ,  A'  such  that  Aq  *  A'  A  Ao  *  A. 
Proof 

By  induction  on  the  derivation  of  E  F  {M ;  e)  ^  r. 

Case  SVar 

By  assumption,  we  have  that 
SVar 

E,x  :  V  \-  {M;x}  ^  {M ;  v) 

The  only  symbolic  execution  rule  that  applies  is  SEVar. 

SEVar 

E,®  :  s  F  (S;*)  (S;  s) 

Since  by  assumption,  we  have  that 

{E,x  :  v,M)  ^Ao-v-a  {E,x  :  s-,m{S))  , 
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so  trivially,  we  have  that 

{M;v)  ~Af,.v'.A'  {m{S)-,s) 
by  choosing  V'  =  V,  Kq  =  Aq,  and  A'  =  A. 

Case  SVal 
Trivial. 

Case  SPlus 

By  assumption,  we  have  that 
SPlus 

::  Eh  (M;ei)  ^  (Mi;ni) 

S2  E  \-  (Mi;  62)  ^  (M2;  712) 

E  h  (M;  ei  +  62)  ^  (M2;  ni  +  712) 

The  only  symbolic  execution  rule  that  applies  is  SEPlus. 

SEPlus 

::  E  h  (5';ei)  j;  (Ei  ;Mi:int) 
M’2::Eh(5'i;e2)J;(g2;u2:int) 

E  h  (S' ;  ei  +  62)  (S2  ;  (ui:int  +  U2:int):int) 

By  assumption,  we  have  that  |[p(S2)l|'^,  so  |[fl(Si)]]''^  since  it  is 
path  condition  prefix  of  |[3(S2)]1''^  (Lemma  6  on  ,5^2)-  Also,  by 
assumption,  we  have  that 

{E;M)  '^Ao-v-a  (E;m(S))  , 
so 

(Mi;ni)  (m(Si); Mi:int) 

for  some  Vi  ^  V  and  some  Aq  *  Ai  3  Aq  *  A  by  i.h.  on  <#i 
with  .  Then,  we  have  that 

(E;Mi)~a''.Vi.Ai  (S;m(Si)) 

and  [[3(52)1'^'^  because  Vi  3  V,  so 

(M2;n2)  ^A'g-v'  A'  (m(S2);M2:int) 

for  some  C'  D  Vi  D  C  and  some  Aq  *  A'  A  Aq  *  Ai  A 
Aq  *  A  by  i.h.  on  S2  with  A^2.  Since  V'  A  Li,  we  have  that 
|[ui:int|'^  =ni,so 

|[(Mi:int  +  M2:int):int|'^  =  m  +  n2 

and  thus 

(M2;  ni  +  712)  ^A'g-v'-A'  {m{S2);  (ui:int  +  U2:int):int)  . 

Case  SEq,  SNot,  SAnd 
Similar  to  the  case  for  SPLUS. 

Case  SIf-True 
By  assumption,  we  have  that 

SIf-True 

Si  ::  E  \-  (M;  ei)  ^  (Mi;  true) 

(#2  ::  E  h  (Mi;  62)  ^  (M2;  v) 

E  h  (M;  if  ei  then  62  else  63)  ^  (M2;  v) 

The  only  symbolic  execution  rules  that  could  apply  are  SEIf-True 
or  SEIf-Ealse. 

Sub-case  SEIf-True 

SEIf-True 

M-i  ::  Eh  (S';ei)  J)  (Si  ;  31) 

^^2  ::  E  h  (Si [3  ^  3(Si)  A  31]  ;  62)  ^  (S2  ;  S2) 

E  h  (S  ;  if  ei  then  €2  else  63)  J)  (S2  ;  S2) 


By  assumption,  we  have  that  |[3(S2)]'^,  so  |[3(Si)  A  31]'^ 
since  it  is  path  condition  prefix  of  |[3(S2)]'^  (Lemma  6  on 
5^2)-  Thus,  we  have  that  |[3(Si)|'^.  By  assumption,  we  have 
that 

(E;M)  -Ao-u.a  (S;m(S))  , 
so 

(Mi; true)  ~a".Ui.Ai  («t(Si);3i) 

for  some  Vi  A  C  and  some  Aq  *  Ai  A  Aq  *  A  by  i.h.  on 
Si  with  5^1 .  We  have  that 

(E;Mi)  ~a''.Vi.Ai  (S;m(Si)) 

and  |[3(S2)]1^^  because  Vi  A  V,  so 

{M2-,v)  {m{S2);s2} 

for  some  V'  ^  Vi  D  V  and  some  Aq  *  A'  A  Aq  *  Ai  A 
Aq  *  A  by  i.h.  on  S2  with  ^2- 

Sub-case  SEIf-Ealse 

SEIf-False 

M-i  ::Eh(S;ei)J)(Si;3i) 

,^2  ::  E  h  (S[3  ^  3(Si)  A  ^31]  i  £3)  ^  (S3  ;  S3) 

E  h  (S  ;  if  ei  then  62  else  63)  J)  (S3  ;  S3) 

By  assumption,  we  have  that  |[3(S3)]]''^,  so  |[3(Si)  A  “'3i]l'^ 
since  it  is  path  condition  prefix  of  |3(S3)]|'^  (Lemma  6  on 
M2).  Thus,  we  have  that  |[3(Si)]]'^  and  |[“'3i]'^.  By  assump¬ 
tion,  we  have  that 

(E;M)  -Aq.v.a  (E;m(S))  , 
so 

(Mi; true)  ~a".Ui.Ai  (nt(Si);3i) 

for  some  Vi  A  V  and  some  Aq  *  Ai  A  Aq  *  A  by  i.h. 
on  Si  with  Ml.  Thus,  we  have  that  [[31 .  We  also  have  that 
II~'3iI  because  |[“'3i|  ^  and  Vi  A  V.  Contradiction,  both  of 
these  statements  cannot  hold,  so  this  subcase  is  not  possible. 

Case  SIf-False 

Similar  to  the  case  for  SIf-True. 

Case  SLet 

By  assumption,  we  have  that 
SLkt 

^1  ::  Eh  (M;ei)  ^  (Mi;ui) 

S2  ::  E,x  :  vi  h  {Mi;  62}  {M2;V2} 

E  h  (M;  let  at  =  ei  in  62)  (M2;  U2) 

The  only  symbolic  execution  rule  that  applies  SELet. 

SELet 

Mi::Eh(S;ei)J)(Si;si) 

M2  ::  E,  at  :  Si  h  (Si  ;  62)  J)  {S2  ;  S2) 

E  h  (E  ;  let  at  =  ei  in  62)  J)  {S2  ;  S2) 

By  assumption,  we  have  that  |[3(<S'2)]'^,  so  |[3(<S'i)|^  since  it  is 
path  condition  prefix  of  |[3(S2)]l'^  (Lemma  6  on  M2).  Also,  by 
assumption,  we  have  that 

(E;M)  ^Ao-u-a  (E;m(5'))  , 
so 

(Mi;vi)  ~a".Ui.Ai  (m(5'i);si) 
for  some  Ci  A  C  and  some  Aq  *  Ai  A  Aq  *  A  by  i.h.  on  <#1 
with  Ml.  Then,  because  |[si]]'^^  =  Vi,  we  have  that 

{E,x  :  Vi;Mi)  ~a".Vi.Ai  (S,  at  :  si;m(Si))  . 
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Also,  |[s'(S2)]'^'-  because  Vi  5  V,  so 

{M2\V2)  ^A'g-V'-A'  {m{S2)-,S2) 

for  some  F'  D  Vi  3  and  some  Aq  *  A'  D  Aq  *  Ai  D  Aq  * 
A  by  i.h.  on  S2  with  ,5^2- 

Case  SRef 

By  assumption,  we  have  that 
SRef 

<fi  ::  Sh  (M;ei)  ^  (Mi;ni)  I  ^  dom(Mi) 

E\-  (M;ref  ei)  ^  {Mi[l  ^  vi]-,  1) 

The  only  symbolic  execution  rule  that  applies  is  SERef. 

SERef 

::  Eh  (5';ei)  j;  (^i  ;ui:r)  l3^E,S,Si,ui 
s'  =  Sifm  m{Si),  {p-.T  ref  4  ui:t)] 

S  h  (Si  ;  ref  ei)  j;  {S'  ;  /3;r  ref) 

Also  by  assumption,  we  have  that 

(S;M) -Ao.y.A  (S;m(S))  and  l3(Si)r 
since  g{S')  =  g{Si),  so 

{Mi-vi)r^A'„-Vi-Ai  {m{Si);ui:T) 

for  some  Vi  ^  V  and  some  Aq  *  Ai  13  Aq  *  A  by  i.h.  on 
(fi  with  Since  /3  ^  S,S,  Si,mi,  we  can  assume  without 
loss  of  generality  that  /3  ^  dom{Vi) — if  it  were  in  dom{Vi),  we 
could  always  remove  it  without  affecting  (Mi;ui)  ~AhEi  Ai 
(m(Si);  ui:t).  Now,  choose 

V'  =  Vi[P^l]  and  A'  =  Ai,/:t. 

Because  I  ^  dom(M\),  we  have  that 

Aq  hv'  m{S2),  iP-T  ref  4  ui:r)  :  Ag  *  A'  . 

and  thus 

{M^[l^vi]P) 

-A'.E'.A' 

(m(Si),  (p-.T  ref  ^  rti:r);  P'.t  ref)  . 

Case  SAssign 
By  assumption,  we  have  that 

SAssign 

::E;h(M;ei)^(Mi;0 
S2  ■■  E\-  (Mi;  62)  ^  (M2;  V2)  I  €  dom{M2) 

E  h  (M;  ei  ;=  62)  ^  (M2[i  n2l;  W2) 

The  only  symbolic  execution  rule  that  applies  is  SEAssiGN. 
SEAssign 

4"i  ::  E  h  (S  ;  ei)  J)  (Si  ;  si) 

_ M(i::Eh(Si;e2)J;(S2;S2) _ 

E  h  (S  ;  ei  :=  62)  JJ-  (S2[m  m(S2),  (si  ^  S2)]  ;  S2) 

By  assumption,  we  have  that  |p(S2)l|'^,  so  |[fl(Si)]]'^  since  it  is 
path  condition  prefix  of  |[3(S2)]1'^  (Lemma  6  on  M2).  Also,  by 
assumption,  we  have  that 

{E;M)  ~Ao-e-a  (E;m(S))  , 
so 

(Mi;()  ^a^-v-i-Ai  (m(Si);si) 

for  some  Vi  13  C  and  some  Ag  *  Ai  D  Ag  *  A  by  i.h.  on  Si 
with  Ml .  Then,  we  have  that 

(S;Mi)~a''.Vi.Ai  (S;m(Si)) 


and  |[3(S2)]'^'^  because  Vi  A  V,  so 

{M2-,V2)  ~AhV'.A'  (tn(S2);S2) 

for  some  V'  A  Li  and  Ag  *  A'  A  Ag  *  Ai  by  i.h.  on  S2  with 
M2.  Thus,  we  have 

Ag  hy/  m(S2)  :  Ag  >1:  A'  . 

Since  V'  A  Li  and  Ag  *  A'  A  Ag  *  Ai,  we  have  that 
|[Mi:ri]^  =  l  and  0  Laq^a'  ^  :  n 

where  si  =  Mi:ri.  Therefore,  the  symbolic  memory  relation 
holds  for  the  symbolic  memory  with  the  logged  write: 

Ag  hy/  m(S2),  (si  ^  S2)  :  Ag  *  A'  . 

Einally,  because  I  €  dom{M2),  we  have  that 

(M2[Z  V2];V2)  ~AGV"-A'  (w(S2),  (si  ^  S2);S2)  . 

Case  SDeref 
By  assumption,  we  have  that 

SDeref 

Si  E\- {M;ei)  ^  {MiPi}  h  £  dom{Mi) 

Sh  (M;!ei)  ^  (Mi;Mi((i)) 

The  only  symbolic  execution  rule  that  applies  is  SEDeref. 
SEDeref 

Ml  ::  E  h  (S  ;  ei)  J)  (Si  ;  iti:r  ref)  ::  h  m(Si)  ok 

E  h  (S  ;  !ei)  J)  (Si  ;  m(Si)[Mi:r  ref]:r) 

Also  by  assumption,  we  have  that 

(£;M) -Ao.v.a  (E;m(S))  and  l5(Si)r  , 

SO 

(Mi;Zi)  ~a(,.e'.a'  (m(Si);ui:T  ref) 

for  some  C'  A  C  and  Ag  *  A'  A  Ag  *  A  by  i.h.  on  Si  with 
Ml .  Thus,  we  have  that 

Ag  hy/  m(Sl)  :  Ag  *  A' 

and 

0  l-A^>f=A'  lur-T  ref]'^  :  r  ref  . 

By  memory  typing  soundness  (Lemma  5)  on  we  have  that 
|[m(Si)]]'^  ~  Ag  >1:  A'.  Therefore,  we  have  that 

0  l-A(,>f=A'  |Im(Si)[«i:r  ref]]'^  :  r 
as  Zi  e  dom{Mi),  which  shows 

(Mi;Mi(Zi))  ~a'„-v>-a'  (m(Si);m(Si)[rii:r  ref])  . 

Case  SNot-Error 

By  assumption,  we  have  that 

SNot-Error 

^1  ::  Eh  (M;ei)  n  ^  {Mi-,bi) 

E  h  (M;  -■ei)  ^  error 

The  only  symbolic  execution  rule  that  applies  is  SENOT. 

SENot 

M’i::Eh(S;ei)J)(Si;3i) 

Eh  (S;^ei)  J)  (Si  ;  ^<71 :  bool) 

Also  by  assumption,  we  have  that 

(E;M) -Ao.v.a  (E;m(S))  and  l5(Si)r  , 

SO 
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for  some  T/'  D  V  and  Aq  *  A'  D  Aq  *  A  by  i.h.  on  (#i  with 
y\.  Thus,  ri  =  (Mi;  ui)  for  some  M\  and  ui  such  that 

0  Vi  :  bool 

Therefore,  by  a  standard  canonical  forms  lemma,  vi  =  6i  for 
some  bi,  which  contradicts  that  ri  (Mi;  bi).  Therefore,  this 
case  is  not  possible. 

Case  Other  Error  Cases 
Similar  to  SNot-Error. 

Exhaustive  Symbolic  Execution.  With  the  soundness  along  a 
path,  we  can  state  what  it  means  for  symbolic  execution  to  be 
exhaustive.  To  do  so,  we  say 

exhaustive  g  {gi, . . .  ,gn)  if  3  =>  pi  V  . . .  V  is  a  tautology. 

In  other  words,  regardless  of  the  valuation,  the  path  conditions  are 
exhaustive  up  to  an  initial  guard  g. 

Exhaustive  symbolic  execution,  a  corollary  to  symbolic  execu¬ 
tion  soundness  (Theorem  7),  can  be  stated  as  follows.  Observe  that 
the  premise  |[(?(5')]]'^  only  says  that  the  path  condition  holds  in  the 
initial  state. 

Corollary  7.1  (Exhaustive  Symbolic  Execution) 

Suppose  E  h  (M;  e)  ^  (M';  v)  and  we  have  n  >  0  symbolic 
executions 

S  h  (S  ;  e)  JJ.  (Si  ;  Si)  such  that 

exhaustweg(s){g{Si),  ■  ■  ■  ,g{S„))  and 

(£;;M) -Ao.y.A  (S;m(S))  and  l5(S)r  . 

then  (M';u)  {rn{Si)-,  Si)  for  some  i  G  l..n,  V'  A 

V,  and  some  Aq  ,  A'  such  that  Ag  *  A'  A  ^0  *  A. 

Proof 

Direct.  The  meaning  of  exhaustiveg(^s){g{Si),  ■  ■  ■ ,  g{Sn.))  is 
that  g{S)  =>  5(>S'i)  V  ...  V  g{Sn.)  is  a  tautology.  Therefore,  we 
have  that 

Il5(-S')l'^  ll5(>S'i)  V  ...  V 

because  the  above  holds  for  all  valuations.  By  assumption, 
|[(/(S)I|'^,  so  we  have  that  13(51)  V  ...  V  p(S„)I|'^  and  con¬ 
sider  cases.  Suppose  |[3(Si)]|'^  for  each  i  G  l..n,  then 

{M';v)  ^Afv'-A'  (m(Si);Si) 

for  some  V'  and  some  Ag  *  A'  D  Ag  *  A  by  symbolic 
execution  soundness  (Theorem  7). 

As  a  technical  point,  the  concerned  reader  might  be  worried  that  we 
are  commingling  existentially  quantified  variables  (i.e.,  the  sym¬ 
bolic  variables  a)  from  different  runs  of  the  symbolic  executor 
in  exhaustive  (...);  however,  this  commingling  is  permissible,  as 
we  are  combining  with  disjuncts  and  existential  quantification  dis¬ 
tributes  over  disjunction.  At  the  same  time,  for  debugging  in  an 
implementation,  we  would  likely  want  a  fixed,  deterministic  strat¬ 
egy  that  ensures  that  symbolic  executions  of  common  prefixes  of 
paths  use  the  same  sequence  of  symbolic  variable  names. 

A.4  Mix  Soundness 

With  type  soundness  (Theorem  3)  and  symbolic  execution  sound¬ 
ness  (Theorem  7),  we  show  soundness  of  Mix  by  additionally  con¬ 
sidering  the  cases  for  the  switching  rules. 


Theorem  8  (Mix  Soundness) 

1-  If 

E  h  (M ;  e)  ^  r  and 
r  Ea  e  :  r  such  that 

(77;M)~(r;A)  , 

then  0  Ea'  v  :  r  and  M'  ~  E'  for  some  M',  v,  A'  such  that 
r  =  {M'\  v)  and  A'  A  A. 

2.  If 

E  E  (M;  e)  ^  r  and 

E  E  (5  ;  e)  J)  {S'  ;  s)  such  that 

(E;;M) -Ao-v.a  (E;m(S))  and  l3(5")r  - 

then 

r  ^Afv'  A'  (m(S');s} 

for  some  V'  ^  V  and  some  Ag,  A'  such  that  Ag  >1:  A'  A  Ag  * 
A. 

Proof 

By  simultaneous  induction  on  the  derivations  of 

E\-  {M-e)  ^  r  . 

We  include  the  cases  from  type  soundness  (Theorem  3)  and  sym¬ 
bolic  execution  soundness  (Theorem  7)  unchanged  and  consider 
the  additional  mix  cases  (Eigure  4). 

Case  SSymBlock 
By  assumption,  we  have  that 

SSymBlock 

A  ::  SE  (M;ei)  ^  (Mi;ui) 

.BE  (M;{s  ei  *})  ^  (Mi;t;i) 

The  only  type  checking  or  symbolic  execution  rule  that  applies 
is  the  type  checking  rule  TSymBlock. 

TSymBlock 

S(a;)  =  ax'.T{x)  (for  all  x  G  dom{V)) 

E  E  (5  ;  e)  J)  (Si  ;  Ui'.r)  S  =  (true  ;  /r) 

E  m{Si)  ok  exhaustive{gi, . . . ,  3„)  (i  G  l..n) 
r  E  {s  e  s}  :  r 

By  assumption,  we  have  that  (77;  M)  ~  (T;  A).  Let  V  be  the 
valuation  such  that 

V (cnx)  =  E{x)  for  all  x  G  dom{T)  and 
C(m)  =  M 

Now,  choose  Ag  =  A,  and  we  have  that 

(77;  M)  ~Ao.y.0  (S;m(S)) 

As  we  have  exhaustive{g{Si) , . . . ,  g{S„)),  we  have  that  3(Si)V 
...  V  g{S„)  is  a  tautology.  Therefore,  we  have  that 

lg{Si)  V  ...  V  3(S„)1^ 

because  the  above  holds  for  all  valuations.  Now  consider  cases, 
suppose  |[3(Si)|'^  for  each  i  G  l..n,  then 

(Mi;ui)  ~A^.Vi.A,  {m{Si)-,Ui:T) 

for  some  V)  A  C  and  some  Ag  *  Ai  A  Ag  by  the  i.h.  on  A.  By 
memory  typing  soundness  (Lemma  5),  we  have  that 

|[m(Si)|'^  ~  Ag  *  Ai  and  thus  Mi  ~  Ag  *  Ai  . 

Let  A'  =  Ag  *  Ai,  then  we  have  that 

0  Ea'  wi  :  T  and  Mi  ~  E' 
where  A'  A  A. 
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Case  STypBlock 
By  assumption,  we  have  that 

STypBlock 
Eh  (M;{t  ei  t})  ^ 

The  only  type  checking  or  symbolic  execution  rule  that  applies 
is  the  symbolic  execution  rule  SETypBlock. 

SETypBlock 

h  S  :  r 

,^::hm(S')ok  ,3^  ::  T  h  e  :  t 

E  h  (S' ;  {t  e  t})  {S{m  /i']  ;  (3:t) 

Also  by  assumption,  we  have  that 

(E;M)  ~A„.v.A  (S;m(S)) 

In  particular,  we  have  that 

E~(r;Ao>f=A) 

because  if  h  S  :  T  and  h  E  :  T'  then  F  =  F'.  By  memory  type 
soundness  (Lemma  5)  on  we  have  that  M  ~  Aq  *  A,  so 

(E;  M)  ~  (F;  Ao  *  A) 

Then,  we  have  that 

0  Fa^  Vi  :  T  and  Mi  ~  Aq 

for  some  Ag  D  tVo  *  A  hy  the  i.h.  on  S'!  with  Since 
/i',/3  ^  E,  S,  we  can  assume  without  loss  of  generality  that 
13  ^  dom{V).  Now,  choose 

V'  =  V[/3  ^  ^  Ml] 

Then,  we  have  that 

Ag  hv'  :  Ag  , 

and  thus,  we  have  that 

{Mi;vi}  ~A(,.Y'.0  • 

Note  that  Mix  soundness  (Theorem  8)  as  stated  above  consid¬ 
ers  symbolic  execution  along  a  path.  To  show  soundness  of  the 
top-level  expression,  we  can  consider  a  exhaustive  constraint  as 
in  Corollary  7.1  or  simply  say  that  the  top-level  expression  is  al¬ 
ways  wrapped  in  a  type  checking  block  {t  e  t}.  In  order  words, 
exhaustive  symbolic  execution  of  an  expression  e  is  type  checking 
of  {t  {s  e  s}  t}. 
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